Model save

Files changed (5) hide show

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 3.2544
 ## Model description
@@ -44,28 +44,31 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch | Step  | Validation Loss |
-|:-------------:|:-----:|:-----:|:---------------:|
-| 3.8254        | 0.064 | 5000  | 3.8259          |
-| 3.5877        | 0.128 | 10000 | 3.6043          |
-| 3.4987        | 0.192 | 15000 | 3.5176          |
-| 3.4495        | 0.256 | 20000 | 3.4692          |
-| 3.4157        | 0.32  | 25000 | 3.4390          |
-| 3.4003        | 0.384 | 30000 | 3.4159          |
-| 3.382         | 0.448 | 35000 | 3.4012          |
-| 3.3649        | 0.512 | 40000 | 3.3881          |
-| 3.3592        | 0.576 | 45000 | 3.3781          |
-| 3.3518        | 0.64  | 50000 | 3.3702          |
-| 3.3447        | 0.704 | 55000 | 3.3609          |
-| 3.3348        | 0.768 | 60000 | 3.3474          |
-| 3.2925        | 0.832 | 65000 | 3.3085          |
-| 3.2576        | 0.896 | 70000 | 3.2757          |
-| 3.2402        | 0.96  | 75000 | 3.2544          |
 ### Framework versions
-- Transformers 4.57.0.dev0
 - Pytorch 2.8.0+cu129
 - Datasets 3.6.0
 - Tokenizers 0.22.1

 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 3.2500
 ## Model description
 ### Training results
+| Training Loss | Epoch  | Step  | Validation Loss |
+|:-------------:|:------:|:-----:|:---------------:|
+| 3.8685        | 0.0533 | 5000  | 3.8281          |
+| 3.6486        | 0.1067 | 10000 | 3.6086          |
+| 3.5596        | 0.16   | 15000 | 3.5224          |
+| 3.5262        | 0.2133 | 20000 | 3.4727          |
+| 3.4856        | 0.2667 | 25000 | 3.4395          |
+| 3.4585        | 0.32   | 30000 | 3.4181          |
+| 3.4457        | 0.3733 | 35000 | 3.4013          |
+| 3.4247        | 0.4267 | 40000 | 3.3865          |
+| 3.4283        | 0.48   | 45000 | 3.3743          |
+| 3.4162        | 0.5333 | 50000 | 3.3662          |
+| 3.4076        | 0.5867 | 55000 | 3.3583          |
+| 3.3995        | 0.64   | 60000 | 3.3524          |
+| 3.3996        | 0.6933 | 65000 | 3.3458          |
+| 3.3921        | 0.7467 | 70000 | 3.3410          |
+| 3.3648        | 0.8    | 75000 | 3.3157          |
+| 3.3406        | 0.8533 | 80000 | 3.2877          |
+| 3.3174        | 0.9067 | 85000 | 3.2641          |
+| 3.2925        | 0.96   | 90000 | 3.2500          |
 ### Framework versions
+- Transformers 4.57.0
 - Pytorch 2.8.0+cu129
 - Datasets 3.6.0
 - Tokenizers 0.22.1

config.json CHANGED Viewed

@@ -3,13 +3,13 @@
     "NeoLLMForCausalLM"
   ],
   "attention_bias": false,
-  "attention_dropout": 0.0,
   "auto_map": {
     "AutoConfig": "configuration_neollm.NeoLLMConfig",
     "AutoModel": "modeling_neollm.NeoLLMModel",
     "AutoModelForCausalLM": "modeling_neollm.NeoLLMForCausalLM"
   },
-  "dropout_rate": 0.0,
   "dtype": "bfloat16",
   "eos_token_id": 151645,
   "fan_ratio": 0.125,
@@ -47,6 +47,6 @@
   "rms_norm_eps": 1e-06,
   "rope_scaling": null,
   "rope_theta": 10000.0,
-  "transformers_version": "4.57.0.dev0",
   "vocab_size": 151665
 }

     "NeoLLMForCausalLM"
   ],
   "attention_bias": false,
+  "attention_dropout": 0.1,
   "auto_map": {
     "AutoConfig": "configuration_neollm.NeoLLMConfig",
     "AutoModel": "modeling_neollm.NeoLLMModel",
     "AutoModelForCausalLM": "modeling_neollm.NeoLLMForCausalLM"
   },
+  "dropout_rate": 0.1,
   "dtype": "bfloat16",
   "eos_token_id": 151645,
   "fan_ratio": 0.125,
   "rms_norm_eps": 1e-06,
   "rope_scaling": null,
   "rope_theta": 10000.0,
+  "transformers_version": "4.57.0",
   "vocab_size": 151665
 }

generation_config.json CHANGED Viewed

@@ -4,5 +4,5 @@
     151645
   ],
   "pad_token_id": 151643,
-  "transformers_version": "4.57.0.dev0"
 }

     151645
   ],
   "pad_token_id": 151643,
+  "transformers_version": "4.57.0"
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f2b7df33ffc0fd82f9e293b00783d8193039e5d145609168535212fa6b4523ea
 size 245234560

 version https://git-lfs.github.com/spec/v1
+oid sha256:83957e75447ef4c2b1dcd2dac38119d174aeab7d9075cdab8e1107f2aceebfe4
 size 245234560

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:376e1ae04e84cca9a92e134ee9edbbd668e7aa2e9d98dfb5c05a99260bd967e6
-size 5585

 version https://git-lfs.github.com/spec/v1
+oid sha256:25c46a2850e36fe6a43353ef94521e9dbda2ddad2b2211a691e88e5726c689b7
+size 6033