Model save

Files changed (4) hide show

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 3.5599
 ## Model description
@@ -44,11 +44,17 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch | Step  | Validation Loss |
-|:-------------:|:-----:|:-----:|:---------------:|
-| 3.8769        | 0.32  | 5000  | 3.8478          |
-| 3.6663        | 0.64  | 10000 | 3.6390          |
-| 3.5933        | 0.96  | 15000 | 3.5599          |
 ### Framework versions

 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 3.3881
 ## Model description
 ### Training results
+| Training Loss | Epoch  | Step  | Validation Loss |
+|:-------------:|:------:|:-----:|:---------------:|
+| 3.8906        | 0.1067 | 5000  | 3.8169          |
+| 3.7452        | 0.2133 | 10000 | 3.6688          |
+| 3.6584        | 0.32   | 15000 | 3.5842          |
+| 3.6004        | 0.4267 | 20000 | 3.5230          |
+| 3.5443        | 0.5333 | 25000 | 3.4647          |
+| 3.5011        | 0.64   | 30000 | 3.4239          |
+| 3.4836        | 0.7467 | 35000 | 3.4020          |
+| 3.4549        | 0.8533 | 40000 | 3.3917          |
+| 3.4606        | 0.96   | 45000 | 3.3881          |
 ### Framework versions

config.json CHANGED Viewed

@@ -33,10 +33,10 @@
     "full_attention"
   ],
   "linear_conv_kernel_dim": 4,
-  "linear_key_head_dim": 64,
   "linear_num_key_heads": 8,
-  "linear_num_value_heads": 8,
-  "linear_value_head_dim": 64,
   "max_position_embeddings": 512,
   "model_type": "neollm",
   "num_attention_heads": 8,

     "full_attention"
   ],
   "linear_conv_kernel_dim": 4,
+  "linear_key_head_dim": 32,
   "linear_num_key_heads": 8,
+  "linear_num_value_heads": 16,
+  "linear_value_head_dim": 32,
   "max_position_embeddings": 512,
   "model_type": "neollm",
   "num_attention_heads": 8,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b2dd23aa08887f03bd0b2dedd3cad57b2082d48405b2ccce9a83326f7aa36639
-size 250414224

 version https://git-lfs.github.com/spec/v1
+oid sha256:9f14b10de44faaa69ef479a457dd75c927d920e17189ae8dae55f72437960e89
+size 245234560

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8fd9d3b2d046076fdbc583cb7f50b7d44f720f28c58bef658cdd96395ef7180e
 size 5521

 version https://git-lfs.github.com/spec/v1
+oid sha256:d73ddd42d104db05486ea8abf270ee84a6d551ab624552e075f5f3a06e239b63
 size 5521