gbyuvd commited on
Commit
058f009
Β·
verified Β·
1 Parent(s): 73843ba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -6
README.md CHANGED
@@ -13,7 +13,7 @@ tags:
13
 
14
  # 🧬 ChemQ3MTP-base
15
 
16
- ChemQ3MTP-base is a lightweight generative model for chemistry, built on mini **Qwen2-like** backbone with **multi-horizon predictive loss** for molecular SELFIES representations.
17
 
18
  Current version (0.1) (Lic for Code: MIT; Weights: Apache 2.0)
19
 
@@ -24,17 +24,40 @@ A custom Qwen2-style language model, adapted for molecular generation:
24
  - βœ… **Horizon Loss** – Weighted multi-horizon objectives for long-term coherence
25
  - βœ… **SELFIES-native Tokenizer** – Robust encoding with [FastChemTokenizer](https://github.com/gbyuvd/FastChemTokenizer)
26
  - βœ… **Ranger21 Optimizer** – Warmup/warmdown scheduling for stable training
27
- - βœ… **Gradient Checkpointing & Streaming Dataset Loader** – Lightweight, hardware-friendly, optimized for rapid RL prototyping
 
 
28
  - βœ… **Durrant's Lab Filter** – Integrated substructure filtering based on [gypsum_dl](https://github.com/durrantlab/gypsum_dl/) (Ropp _et al._ 2019) methodology to remove improbable molecular variants in validity check
29
  - βœ… **Pareto Reward Controller** – Ready for RL fine-tuning with dynamic multi-objective optimization balancing validity, synthesizability, and molecular complexity with adaptive weight adjustment
30
 
31
  ---
32
  > πŸ’‘ **Target domain:** molecular generation (SELFIES).
33
- > πŸ”¬ **Goal:** molecules that are valid, bioaware, and synthetically accessible.
34
- > πŸš€ **Core innovation:** fast, modular prototyping of **MTP + RL fine-tuning pipelines** using standard HuggingFace components.
35
-
36
  ---
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ## Usage
39
  Requirements:
40
 
@@ -190,10 +213,14 @@ if mol is not None:
190
  else:
191
  print("❌ Could not create molecule from generated SMILES")
192
  ```
193
- ## βš™οΈ Model Eval
194
 
195
  ---
196
 
 
 
 
 
 
197
  ## ❀️ Support the Project
198
 
199
  Training and scaling require significant computational resources.
 
13
 
14
  # 🧬 ChemQ3MTP-base
15
 
16
+ ChemQ3MTP-base is a lightweight generative model for chemistry trained on 2.3M valid bioactive and natural product molecules, built on mini **Qwen2-like** backbone with **multi-horizon predictive loss** for molecular SELFIES representations.
17
 
18
  Current version (0.1) (Lic for Code: MIT; Weights: Apache 2.0)
19
 
 
24
  - βœ… **Horizon Loss** – Weighted multi-horizon objectives for long-term coherence
25
  - βœ… **SELFIES-native Tokenizer** – Robust encoding with [FastChemTokenizer](https://github.com/gbyuvd/FastChemTokenizer)
26
  - βœ… **Ranger21 Optimizer** – Warmup/warmdown scheduling for stable training
27
+ - βœ… **Gradient Checkpointing** – Lightweight, hardware-friendly, optimized for rapid RL prototyping
28
+
29
+ RL-Ready Features:
30
  - βœ… **Durrant's Lab Filter** – Integrated substructure filtering based on [gypsum_dl](https://github.com/durrantlab/gypsum_dl/) (Ropp _et al._ 2019) methodology to remove improbable molecular variants in validity check
31
  - βœ… **Pareto Reward Controller** – Ready for RL fine-tuning with dynamic multi-objective optimization balancing validity, synthesizability, and molecular complexity with adaptive weight adjustment
32
 
33
  ---
34
  > πŸ’‘ **Target domain:** molecular generation (SELFIES).
35
+ > πŸ”¬ **Goal:** general base model knowledgable and capable in generating SELFIES representation of new molecules
36
+ > πŸš€ **Core innovation:** fast, modular **MTP + RL fine-tuning pipelines** using standard HuggingFace components.
 
37
  ---
38
 
39
+ # Disclaimer and Responsible Use Policy
40
+ **Model Purpose**: This generative model is designed exclusively for research and development applications in drug discovery and materials science. The model is intended to assist researchers in hypothesis generation, molecular design, and materials exploration.
41
+
42
+ **Limitations and Accuracy**:
43
+
44
+ The model's outputs are predictions and should be validated through experimental verification
45
+ The author makes no warranties regarding the accuracy, completeness, reliability, or suitability of generated results
46
+ Users assume all risks associated with model outputs and their applications
47
+
48
+ **Prohibited Uses**:
49
+
50
+ The model must not be used for:
51
+
52
+ Legal, medical, or regulatory decision-making without proper validation
53
+ Generating dangerous, toxic, or harmful compounds without appropriate safety measures
54
+ Any illegal activities or purposes
55
+ Military, defense, or weapons development applications
56
+ Circumventing safety regulations or ethical guidelines
57
+ Compliance: Users are responsible for ensuring compliance with applicable laws, regulations, and institutional policies in their jurisdiction.
58
+
59
+ **Liability**: The author disclaims all liability for damages arising from the use or misuse of this model.
60
+
61
  ## Usage
62
  Requirements:
63
 
 
213
  else:
214
  print("❌ Could not create molecule from generated SMILES")
215
  ```
 
216
 
217
  ---
218
 
219
+
220
+
221
+ ## βš™οΈ Model Eval
222
+ - Perplexity on unseen set:
223
+
224
  ## ❀️ Support the Project
225
 
226
  Training and scaling require significant computational resources.