prancyFox commited on
Commit
97eef92
·
verified ·
1 Parent(s): 1cbdb5d

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +254 -0
  2. checkpoint-1000/config.json +36 -0
  3. checkpoint-1000/model.safetensors +3 -0
  4. checkpoint-1000/optimizer.pt +3 -0
  5. checkpoint-1000/rng_state.pth +3 -0
  6. checkpoint-1000/scheduler.pt +3 -0
  7. checkpoint-1000/special_tokens_map.json +7 -0
  8. checkpoint-1000/tokenizer.json +0 -0
  9. checkpoint-1000/tokenizer_config.json +58 -0
  10. checkpoint-1000/trainer_state.json +243 -0
  11. checkpoint-1000/training_args.bin +3 -0
  12. checkpoint-1000/vocab.txt +0 -0
  13. checkpoint-1200/config.json +36 -0
  14. checkpoint-1200/model.safetensors +3 -0
  15. checkpoint-1200/optimizer.pt +3 -0
  16. checkpoint-1200/rng_state.pth +3 -0
  17. checkpoint-1200/scheduler.pt +3 -0
  18. checkpoint-1200/special_tokens_map.json +7 -0
  19. checkpoint-1200/tokenizer.json +0 -0
  20. checkpoint-1200/tokenizer_config.json +58 -0
  21. checkpoint-1200/trainer_state.json +283 -0
  22. checkpoint-1200/training_args.bin +3 -0
  23. checkpoint-1200/vocab.txt +0 -0
  24. checkpoint-1400/config.json +36 -0
  25. checkpoint-1400/model.safetensors +3 -0
  26. checkpoint-1400/optimizer.pt +3 -0
  27. checkpoint-1400/rng_state.pth +3 -0
  28. checkpoint-1400/scheduler.pt +3 -0
  29. checkpoint-1400/special_tokens_map.json +7 -0
  30. checkpoint-1400/tokenizer.json +0 -0
  31. checkpoint-1400/tokenizer_config.json +58 -0
  32. checkpoint-1400/trainer_state.json +323 -0
  33. checkpoint-1400/training_args.bin +3 -0
  34. checkpoint-1400/vocab.txt +0 -0
  35. checkpoint-1600/config.json +36 -0
  36. checkpoint-1600/model.safetensors +3 -0
  37. checkpoint-1600/optimizer.pt +3 -0
  38. checkpoint-1600/rng_state.pth +3 -0
  39. checkpoint-1600/scheduler.pt +3 -0
  40. checkpoint-1600/special_tokens_map.json +7 -0
  41. checkpoint-1600/tokenizer.json +0 -0
  42. checkpoint-1600/tokenizer_config.json +58 -0
  43. checkpoint-1600/trainer_state.json +363 -0
  44. checkpoint-1600/training_args.bin +3 -0
  45. checkpoint-1600/vocab.txt +0 -0
  46. checkpoint-1800/config.json +36 -0
  47. checkpoint-1800/model.safetensors +3 -0
  48. checkpoint-1800/optimizer.pt +3 -0
  49. checkpoint-1800/rng_state.pth +3 -0
  50. checkpoint-1800/scheduler.pt +3 -0
README.md ADDED
@@ -0,0 +1,254 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ pipeline_tag: text-classification
5
+ library_name: transformers
6
+ tags:
7
+ - spam
8
+ - ham
9
+ - email
10
+ - tinybert
11
+ - enron
12
+ - text-classification
13
+ model-index:
14
+ - name: prancyFox/tiny-bert-enron-spam
15
+ results:
16
+ - task:
17
+ type: text-classification
18
+ name: Spam / Ham Classification
19
+ dataset:
20
+ name: Enron (processed CSV)
21
+ type: enron_email
22
+ split: test
23
+ metrics:
24
+ - name: F1 (macro)
25
+ type: f1
26
+ value: 0.7666
27
+ - name: ROC-AUC
28
+ type: roc_auc
29
+ value: 0.9977
30
+ - name: Precision (spam)
31
+ type: precision
32
+ value: 0.9954
33
+ - name: Recall (spam)
34
+ type: recall
35
+ value: 0.5632
36
+ - name: Precision (ham)
37
+ type: precision
38
+ value: 0.6875
39
+ - name: Recall (ham)
40
+ type: recall
41
+ value: 0.9973
42
+ base_model: huawei-noah/TinyBERT_General_4L_312D
43
+ ---
44
+
45
+ # TinyBERT Spam Classifier (Enron)
46
+
47
+ A compact **TinyBERT (4-layer, 312 hidden)** model fine-tuned to classify **email text** as **spam** or **ham**.
48
+ Trained on an Enron-derived CSV with light email-specific cleaning (e.g., removing quoted lines and base64-like blobs).
49
+ Optimized for **low false positives** by default; adjust the decision threshold if you want higher spam recall.
50
+
51
+ > Labels: `ham` (0) and `spam` (1)
52
+
53
+ ---
54
+
55
+ ## ✨ Quick Start
56
+
57
+ ```python
58
+ from transformers import pipeline
59
+
60
+ clf = pipeline(
61
+ "text-classification",
62
+ model="prancyFox/tiny-bert-enron-spam",
63
+ truncation=True # recommended for long emails
64
+ )
65
+
66
+ clf("Congratulations! You won a FREE iPhone. Click here now!")
67
+ # [{'label': 'spam', 'score': 0.98}]
68
+ ````
69
+
70
+ **Batch inference**
71
+
72
+ ```python
73
+ texts = [
74
+ "Meeting moved to 3pm, see agenda attached.",
75
+ "FREE gift card!!! Act now!",
76
+ ]
77
+ preds = clf(texts, truncation=True)
78
+ ```
79
+
80
+ ---
81
+
82
+ ## 🔎 Intended Use & Limitations
83
+
84
+ **Intended use**
85
+
86
+ * Classifying **email bodies (and optionally subject+body)** as spam vs ham.
87
+ * Low-latency scenarios where a small model is preferred.
88
+
89
+ **Out of scope / Limitations**
90
+
91
+ * Non-English email content may reduce accuracy.
92
+ * Long threads with heavy quoting/footers can dilute signal (use truncation + cleaning).
93
+ * Trained on Enron-style corporate emails; consumer emails may differ (consider further fine-tuning).
94
+
95
+ ---
96
+
97
+ ## 🧰 How We Preprocessed the Data
98
+
99
+ Light normalization aimed at keeping semantic content:
100
+
101
+ * Remove long base64-like blobs.
102
+ * Drop quoted lines starting with `>` or `|`.
103
+ * Optional: concatenate `Subject + "\n" + Message` when available.
104
+ * Collapse repeated whitespace.
105
+
106
+ (You can replicate similar cleaning in your serving pipeline for alignment.)
107
+
108
+ ---
109
+
110
+ ## 🏋️ Training Details
111
+
112
+ * **Base model:** `huawei-noah/TinyBERT_General_4L_312D`
113
+ * **Task:** Binary text classification (`ham`=0, `spam`=1)
114
+ * **Tokenizer:** fast BERT tokenizer (uncased)
115
+ * **Max length:** 256 tokens
116
+ * **Optimizer / LR:** AdamW, learning rate `2e-5 – 5e-5` (final run `3e-5`)
117
+ * **Batch size:** 32
118
+ * **Epochs:** 4 (early stopping enabled)
119
+ * **Warmup:** 10%
120
+ * **Weight decay:** 0.01
121
+ * **Loss:** Cross-entropy with class weighting (ham/spam balanced from label distribution). Focal loss available in the trainer.
122
+ * **Early stopping metric:** `eval_f1`
123
+ * **Best checkpoint:** Saved using evaluation on validation set.
124
+
125
+ > Trainer script: `train/train_tinybert.py` (TinyBERT-compatible, with legacy HF support shims).
126
+
127
+ ---
128
+
129
+ ## 📊 Evaluation (Chunked Benchmark Summary)
130
+
131
+ Metrics below reflect a **chunked evaluation** pass (used for long emails), where the model sees up to 512 tokens per chunk with overlap. Threshold tuned to minimize false positives:
132
+
133
+ ### Classification Report
134
+
135
+ | Class | Precision | Recall | F1 |
136
+ | ------------: | ---------: | ---------: | ---------: |
137
+ | ham | 0.6875 | 0.9973 | 0.8139 |
138
+ | spam | 0.9954 | 0.5632 | 0.7194 |
139
+ | **macro avg** | **0.8414** | **0.7802** | **0.7666** |
140
+
141
+ * **ROC-AUC:** 0.9977
142
+
143
+ **Confusion matrix**
144
+
145
+ ```
146
+ [[16500 45]
147
+ [ 7500 9671]]
148
+ ```
149
+
150
+ **Interpretation:** The model is conservative (very few false positives on ham). If you need to catch more spam, **lower the decision threshold** (e.g., from 0.5 → 0.35) or re-train with a spam-skewed class weight / focal loss.
151
+
152
+ ---
153
+
154
+ ## 🎛️ Threshold & Long-Email Guidance
155
+
156
+ * **Threshold:** Default is 0.5. For higher spam recall, try **0.35–0.45** and evaluate impact on false positives.
157
+ * **Long emails:** For multi-paragraph threads, consider **chunking** and aggregating chunk-level spam scores (e.g., max or average). Our reference app uses 512-token chunks with overlap.
158
+
159
+ ---
160
+
161
+ ## 🧪 Reproducibility
162
+
163
+ **Environment**
164
+
165
+ * Python 3.10/3.11
166
+ * `transformers >= 4.40`
167
+ * `datasets >= 2.20`
168
+ * `evaluate >= 0.4.2`
169
+ * `torch >= 2.1`
170
+
171
+ **Training command (example)**
172
+
173
+ ```bash
174
+ python train/train_tinybert.py \
175
+ --train data/enron.csv \
176
+ --text_col Message --label_col "Spam/Ham" \
177
+ --output_dir outputs/tiny-bert-enron-spam \
178
+ --epochs 4 --batch_size 32 --lr 3e-5 \
179
+ --max_length 256 --fp16
180
+ ```
181
+
182
+ **Serving (FastAPI example)**
183
+
184
+ ```bash
185
+ python spam_bert.py --serve \
186
+ --model prancyFox/tiny-bert-enron-spam \
187
+ --model-cache-dir ./models_cache
188
+ ```
189
+
190
+ ---
191
+
192
+ ## 📁 Files
193
+
194
+ This repo should include:
195
+
196
+ * `config.json`
197
+ * `pytorch_model.bin` or `model.safetensors`
198
+ * `tokenizer.json` and `tokenizer_config.json` (or `vocab.txt` etc.)
199
+ * `README.md` (this file)
200
+ * (Optional) `label_mapping.json` with `{"ham": 0, "spam": 1}`
201
+
202
+ ---
203
+
204
+ ## ⚖️ License
205
+
206
+ * **Model weights & code**: MIT
207
+ * **Dataset**: Check the original Enron dataset/license terms before redistribution.
208
+
209
+ ---
210
+
211
+ ## 🔬 Ethical Considerations & Risks
212
+
213
+ * False positives can have operational cost (missed legitimate emails). This model is tuned to minimize them; if you change the threshold, validate carefully.
214
+ * Spam evolves. Periodically re-train with fresh samples to maintain accuracy.
215
+ * Non-English or code-mixed content may degrade performance.
216
+
217
+ ---
218
+
219
+ ## 🧩 Citation
220
+
221
+ If you use this model, please cite:
222
+
223
+ ```
224
+ @software{tinybert_enron_spam_2025,
225
+ title = {TinyBERT Spam Classifier (Enron)},
226
+ author = {Ing. Daniel Eder},
227
+ year = {2025},
228
+ url = {https://huggingface.co/prancyFox/tiny-bert-enron-spam}
229
+ }
230
+ ```
231
+
232
+ And the TinyBERT paper:
233
+
234
+ ```
235
+ @article{jiao2020tinybert,
236
+ title={TinyBERT: Distilling BERT for Natural Language Understanding},
237
+ author={Jiao, Xiaoqi and Yin, Yichun and others},
238
+ journal={Findings of EMNLP},
239
+ year={2020}
240
+ }
241
+ ```
242
+
243
+ ---
244
+
245
+ ## 🛠 Maintainers
246
+
247
+ * **Daniel Eder** ([[email protected]](mailto:[email protected]?subject=tiny-bert-enron-spam))
248
+
249
+ ---
250
+
251
+ ### Notes
252
+
253
+ * For a higher-recall variant, fine-tune with `--use_focal_loss` or increase the spam class weight, then re-evaluate thresholds.
254
+ * If you want a **PyTorch Lightning** or **Accelerate** training variant, \~it’s easy to adapt the provided trainer.
checkpoint-1000/config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "cell": {},
7
+ "classifier_dropout": null,
8
+ "emb_size": 312,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 312,
12
+ "id2label": {
13
+ "0": "ham",
14
+ "1": "spam"
15
+ },
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 1200,
18
+ "label2id": {
19
+ "ham": 0,
20
+ "spam": 1
21
+ },
22
+ "layer_norm_eps": 1e-12,
23
+ "max_position_embeddings": 512,
24
+ "model_type": "bert",
25
+ "num_attention_heads": 12,
26
+ "num_hidden_layers": 4,
27
+ "pad_token_id": 0,
28
+ "position_embedding_type": "absolute",
29
+ "pre_trained": "",
30
+ "structure": [],
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.55.0",
33
+ "type_vocab_size": 2,
34
+ "use_cache": true,
35
+ "vocab_size": 30522
36
+ }
checkpoint-1000/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a31c635559db3cefbb53de0825299d4849ec2c98ffa325475aa5c1b93b52a599
3
+ size 57411808
checkpoint-1000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f0a244f0ab5d36cbb5bcb5ee4678113bf196e7b7a105c5679d641bf0cd455d4
3
+ size 114864267
checkpoint-1000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba6621c267fba871ab6674a5abb65fac14a998ca389540d43eca025f378fef4a
3
+ size 14455
checkpoint-1000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:691bbcc71ba9a97565707d1532995eda4e57b6dbdb0849ad042a36a2c432ffa0
3
+ size 1465
checkpoint-1000/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-1000/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1000/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-1000/trainer_state.json ADDED
@@ -0,0 +1,243 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1000,
3
+ "best_metric": 0.9968069666182874,
4
+ "best_model_checkpoint": "models/tinybert-enron\\checkpoint-1000",
5
+ "epoch": 1.0559662090813093,
6
+ "eval_steps": 200,
7
+ "global_step": 1000,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.05279831045406547,
14
+ "grad_norm": 0.7054851055145264,
15
+ "learning_rate": 3.878627968337731e-06,
16
+ "loss": 0.6916,
17
+ "step": 50
18
+ },
19
+ {
20
+ "epoch": 0.10559662090813093,
21
+ "grad_norm": 4.758987903594971,
22
+ "learning_rate": 7.836411609498681e-06,
23
+ "loss": 0.6419,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.1583949313621964,
28
+ "grad_norm": 2.878147602081299,
29
+ "learning_rate": 1.179419525065963e-05,
30
+ "loss": 0.4382,
31
+ "step": 150
32
+ },
33
+ {
34
+ "epoch": 0.21119324181626187,
35
+ "grad_norm": 0.9188410639762878,
36
+ "learning_rate": 1.575197889182058e-05,
37
+ "loss": 0.2162,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.21119324181626187,
42
+ "eval_accuracy": 0.997920997920998,
43
+ "eval_f1": 0.997965707643127,
44
+ "eval_loss": 0.11409526318311691,
45
+ "eval_precision": 0.9959396751740139,
46
+ "eval_recall": 1.0,
47
+ "eval_runtime": 77.9734,
48
+ "eval_samples_per_second": 43.181,
49
+ "eval_steps_per_second": 2.706,
50
+ "step": 200
51
+ },
52
+ {
53
+ "epoch": 0.26399155227032733,
54
+ "grad_norm": 0.42353588342666626,
55
+ "learning_rate": 1.970976253298153e-05,
56
+ "loss": 0.0759,
57
+ "step": 250
58
+ },
59
+ {
60
+ "epoch": 0.3167898627243928,
61
+ "grad_norm": 0.22129850089550018,
62
+ "learning_rate": 2.3667546174142482e-05,
63
+ "loss": 0.0466,
64
+ "step": 300
65
+ },
66
+ {
67
+ "epoch": 0.36958817317845827,
68
+ "grad_norm": 0.14293035864830017,
69
+ "learning_rate": 2.762532981530343e-05,
70
+ "loss": 0.0333,
71
+ "step": 350
72
+ },
73
+ {
74
+ "epoch": 0.42238648363252373,
75
+ "grad_norm": 0.09417829662561417,
76
+ "learning_rate": 2.982399530654151e-05,
77
+ "loss": 0.015,
78
+ "step": 400
79
+ },
80
+ {
81
+ "epoch": 0.42238648363252373,
82
+ "eval_accuracy": 0.9988119988119988,
83
+ "eval_f1": 0.9988365328679465,
84
+ "eval_loss": 0.012017497792840004,
85
+ "eval_precision": 0.9976757699012202,
86
+ "eval_recall": 1.0,
87
+ "eval_runtime": 75.9348,
88
+ "eval_samples_per_second": 44.341,
89
+ "eval_steps_per_second": 2.779,
90
+ "step": 400
91
+ },
92
+ {
93
+ "epoch": 0.4751847940865892,
94
+ "grad_norm": 0.06869468092918396,
95
+ "learning_rate": 2.938398357289528e-05,
96
+ "loss": 0.0143,
97
+ "step": 450
98
+ },
99
+ {
100
+ "epoch": 0.5279831045406547,
101
+ "grad_norm": 0.05820750445127487,
102
+ "learning_rate": 2.8943971839249047e-05,
103
+ "loss": 0.0161,
104
+ "step": 500
105
+ },
106
+ {
107
+ "epoch": 0.5807814149947201,
108
+ "grad_norm": 0.053672004491090775,
109
+ "learning_rate": 2.8503960105602817e-05,
110
+ "loss": 0.0107,
111
+ "step": 550
112
+ },
113
+ {
114
+ "epoch": 0.6335797254487856,
115
+ "grad_norm": 121.33438873291016,
116
+ "learning_rate": 2.8063948371956588e-05,
117
+ "loss": 0.0068,
118
+ "step": 600
119
+ },
120
+ {
121
+ "epoch": 0.6335797254487856,
122
+ "eval_accuracy": 0.9991089991089991,
123
+ "eval_f1": 0.999127145766657,
124
+ "eval_loss": 0.006898785941302776,
125
+ "eval_precision": 0.9982558139534884,
126
+ "eval_recall": 1.0,
127
+ "eval_runtime": 87.7806,
128
+ "eval_samples_per_second": 38.357,
129
+ "eval_steps_per_second": 2.404,
130
+ "step": 600
131
+ },
132
+ {
133
+ "epoch": 0.6863780359028511,
134
+ "grad_norm": 0.2175891101360321,
135
+ "learning_rate": 2.7623936638310355e-05,
136
+ "loss": 0.0123,
137
+ "step": 650
138
+ },
139
+ {
140
+ "epoch": 0.7391763463569165,
141
+ "grad_norm": 0.031244030222296715,
142
+ "learning_rate": 2.7183924904664125e-05,
143
+ "loss": 0.0087,
144
+ "step": 700
145
+ },
146
+ {
147
+ "epoch": 0.791974656810982,
148
+ "grad_norm": 0.028063887730240822,
149
+ "learning_rate": 2.6743913171017896e-05,
150
+ "loss": 0.0122,
151
+ "step": 750
152
+ },
153
+ {
154
+ "epoch": 0.8447729672650475,
155
+ "grad_norm": 0.025719981640577316,
156
+ "learning_rate": 2.6303901437371663e-05,
157
+ "loss": 0.0077,
158
+ "step": 800
159
+ },
160
+ {
161
+ "epoch": 0.8447729672650475,
162
+ "eval_accuracy": 0.9994059994059994,
163
+ "eval_f1": 0.9994179278230501,
164
+ "eval_loss": 0.004702265374362469,
165
+ "eval_precision": 0.9988365328679465,
166
+ "eval_recall": 1.0,
167
+ "eval_runtime": 99.4766,
168
+ "eval_samples_per_second": 33.847,
169
+ "eval_steps_per_second": 2.121,
170
+ "step": 800
171
+ },
172
+ {
173
+ "epoch": 0.8975712777191129,
174
+ "grad_norm": 0.02354113757610321,
175
+ "learning_rate": 2.5863889703725433e-05,
176
+ "loss": 0.0058,
177
+ "step": 850
178
+ },
179
+ {
180
+ "epoch": 0.9503695881731784,
181
+ "grad_norm": 0.02310008369386196,
182
+ "learning_rate": 2.5423877970079204e-05,
183
+ "loss": 0.0056,
184
+ "step": 900
185
+ },
186
+ {
187
+ "epoch": 1.0031678986272439,
188
+ "grad_norm": 0.019235238432884216,
189
+ "learning_rate": 2.498386623643297e-05,
190
+ "loss": 0.0106,
191
+ "step": 950
192
+ },
193
+ {
194
+ "epoch": 1.0559662090813093,
195
+ "grad_norm": 208.1472930908203,
196
+ "learning_rate": 2.454385450278674e-05,
197
+ "loss": 0.0124,
198
+ "step": 1000
199
+ },
200
+ {
201
+ "epoch": 1.0559662090813093,
202
+ "eval_accuracy": 0.9967329967329968,
203
+ "eval_f1": 0.9968069666182874,
204
+ "eval_loss": 0.016226448118686676,
205
+ "eval_precision": 0.9936342592592593,
206
+ "eval_recall": 1.0,
207
+ "eval_runtime": 88.1246,
208
+ "eval_samples_per_second": 38.207,
209
+ "eval_steps_per_second": 2.394,
210
+ "step": 1000
211
+ }
212
+ ],
213
+ "logging_steps": 50,
214
+ "max_steps": 3788,
215
+ "num_input_tokens_seen": 0,
216
+ "num_train_epochs": 4,
217
+ "save_steps": 200,
218
+ "stateful_callbacks": {
219
+ "EarlyStoppingCallback": {
220
+ "args": {
221
+ "early_stopping_patience": 5,
222
+ "early_stopping_threshold": 0.0005
223
+ },
224
+ "attributes": {
225
+ "early_stopping_patience_counter": 0
226
+ }
227
+ },
228
+ "TrainerControl": {
229
+ "args": {
230
+ "should_epoch_stop": false,
231
+ "should_evaluate": false,
232
+ "should_log": false,
233
+ "should_save": true,
234
+ "should_training_stop": false
235
+ },
236
+ "attributes": {}
237
+ }
238
+ },
239
+ "total_flos": 229373753097216.0,
240
+ "train_batch_size": 32,
241
+ "trial_name": null,
242
+ "trial_params": null
243
+ }
checkpoint-1000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f6ee5be6db1ab816abaa77671d6a299c7d2015f383c82c395377bcfdce9d1cd
3
+ size 5713
checkpoint-1000/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1200/config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "cell": {},
7
+ "classifier_dropout": null,
8
+ "emb_size": 312,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 312,
12
+ "id2label": {
13
+ "0": "ham",
14
+ "1": "spam"
15
+ },
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 1200,
18
+ "label2id": {
19
+ "ham": 0,
20
+ "spam": 1
21
+ },
22
+ "layer_norm_eps": 1e-12,
23
+ "max_position_embeddings": 512,
24
+ "model_type": "bert",
25
+ "num_attention_heads": 12,
26
+ "num_hidden_layers": 4,
27
+ "pad_token_id": 0,
28
+ "position_embedding_type": "absolute",
29
+ "pre_trained": "",
30
+ "structure": [],
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.55.0",
33
+ "type_vocab_size": 2,
34
+ "use_cache": true,
35
+ "vocab_size": 30522
36
+ }
checkpoint-1200/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ed481638e470979590a6a0c03f9237e2ef00619bfde08b505adc569fc3f3e70
3
+ size 57411808
checkpoint-1200/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54b3ca65cdc181c36eb175979422431446d89583bba5da2fcdccca2ae639bb02
3
+ size 114864267
checkpoint-1200/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86b4a86c121d70b2aa0ea130d6238e100bb2801dea0fa773b97fc0541331b11a
3
+ size 14455
checkpoint-1200/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60b4839f57741fdc7ac5a9d26657752bea98ceb92dd0989c5b64592f830ab0aa
3
+ size 1465
checkpoint-1200/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-1200/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1200/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-1200/trainer_state.json ADDED
@@ -0,0 +1,283 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1000,
3
+ "best_metric": 0.9968069666182874,
4
+ "best_model_checkpoint": "models/tinybert-enron\\checkpoint-1000",
5
+ "epoch": 1.2671594508975712,
6
+ "eval_steps": 200,
7
+ "global_step": 1200,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.05279831045406547,
14
+ "grad_norm": 0.7054851055145264,
15
+ "learning_rate": 3.878627968337731e-06,
16
+ "loss": 0.6916,
17
+ "step": 50
18
+ },
19
+ {
20
+ "epoch": 0.10559662090813093,
21
+ "grad_norm": 4.758987903594971,
22
+ "learning_rate": 7.836411609498681e-06,
23
+ "loss": 0.6419,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.1583949313621964,
28
+ "grad_norm": 2.878147602081299,
29
+ "learning_rate": 1.179419525065963e-05,
30
+ "loss": 0.4382,
31
+ "step": 150
32
+ },
33
+ {
34
+ "epoch": 0.21119324181626187,
35
+ "grad_norm": 0.9188410639762878,
36
+ "learning_rate": 1.575197889182058e-05,
37
+ "loss": 0.2162,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.21119324181626187,
42
+ "eval_accuracy": 0.997920997920998,
43
+ "eval_f1": 0.997965707643127,
44
+ "eval_loss": 0.11409526318311691,
45
+ "eval_precision": 0.9959396751740139,
46
+ "eval_recall": 1.0,
47
+ "eval_runtime": 77.9734,
48
+ "eval_samples_per_second": 43.181,
49
+ "eval_steps_per_second": 2.706,
50
+ "step": 200
51
+ },
52
+ {
53
+ "epoch": 0.26399155227032733,
54
+ "grad_norm": 0.42353588342666626,
55
+ "learning_rate": 1.970976253298153e-05,
56
+ "loss": 0.0759,
57
+ "step": 250
58
+ },
59
+ {
60
+ "epoch": 0.3167898627243928,
61
+ "grad_norm": 0.22129850089550018,
62
+ "learning_rate": 2.3667546174142482e-05,
63
+ "loss": 0.0466,
64
+ "step": 300
65
+ },
66
+ {
67
+ "epoch": 0.36958817317845827,
68
+ "grad_norm": 0.14293035864830017,
69
+ "learning_rate": 2.762532981530343e-05,
70
+ "loss": 0.0333,
71
+ "step": 350
72
+ },
73
+ {
74
+ "epoch": 0.42238648363252373,
75
+ "grad_norm": 0.09417829662561417,
76
+ "learning_rate": 2.982399530654151e-05,
77
+ "loss": 0.015,
78
+ "step": 400
79
+ },
80
+ {
81
+ "epoch": 0.42238648363252373,
82
+ "eval_accuracy": 0.9988119988119988,
83
+ "eval_f1": 0.9988365328679465,
84
+ "eval_loss": 0.012017497792840004,
85
+ "eval_precision": 0.9976757699012202,
86
+ "eval_recall": 1.0,
87
+ "eval_runtime": 75.9348,
88
+ "eval_samples_per_second": 44.341,
89
+ "eval_steps_per_second": 2.779,
90
+ "step": 400
91
+ },
92
+ {
93
+ "epoch": 0.4751847940865892,
94
+ "grad_norm": 0.06869468092918396,
95
+ "learning_rate": 2.938398357289528e-05,
96
+ "loss": 0.0143,
97
+ "step": 450
98
+ },
99
+ {
100
+ "epoch": 0.5279831045406547,
101
+ "grad_norm": 0.05820750445127487,
102
+ "learning_rate": 2.8943971839249047e-05,
103
+ "loss": 0.0161,
104
+ "step": 500
105
+ },
106
+ {
107
+ "epoch": 0.5807814149947201,
108
+ "grad_norm": 0.053672004491090775,
109
+ "learning_rate": 2.8503960105602817e-05,
110
+ "loss": 0.0107,
111
+ "step": 550
112
+ },
113
+ {
114
+ "epoch": 0.6335797254487856,
115
+ "grad_norm": 121.33438873291016,
116
+ "learning_rate": 2.8063948371956588e-05,
117
+ "loss": 0.0068,
118
+ "step": 600
119
+ },
120
+ {
121
+ "epoch": 0.6335797254487856,
122
+ "eval_accuracy": 0.9991089991089991,
123
+ "eval_f1": 0.999127145766657,
124
+ "eval_loss": 0.006898785941302776,
125
+ "eval_precision": 0.9982558139534884,
126
+ "eval_recall": 1.0,
127
+ "eval_runtime": 87.7806,
128
+ "eval_samples_per_second": 38.357,
129
+ "eval_steps_per_second": 2.404,
130
+ "step": 600
131
+ },
132
+ {
133
+ "epoch": 0.6863780359028511,
134
+ "grad_norm": 0.2175891101360321,
135
+ "learning_rate": 2.7623936638310355e-05,
136
+ "loss": 0.0123,
137
+ "step": 650
138
+ },
139
+ {
140
+ "epoch": 0.7391763463569165,
141
+ "grad_norm": 0.031244030222296715,
142
+ "learning_rate": 2.7183924904664125e-05,
143
+ "loss": 0.0087,
144
+ "step": 700
145
+ },
146
+ {
147
+ "epoch": 0.791974656810982,
148
+ "grad_norm": 0.028063887730240822,
149
+ "learning_rate": 2.6743913171017896e-05,
150
+ "loss": 0.0122,
151
+ "step": 750
152
+ },
153
+ {
154
+ "epoch": 0.8447729672650475,
155
+ "grad_norm": 0.025719981640577316,
156
+ "learning_rate": 2.6303901437371663e-05,
157
+ "loss": 0.0077,
158
+ "step": 800
159
+ },
160
+ {
161
+ "epoch": 0.8447729672650475,
162
+ "eval_accuracy": 0.9994059994059994,
163
+ "eval_f1": 0.9994179278230501,
164
+ "eval_loss": 0.004702265374362469,
165
+ "eval_precision": 0.9988365328679465,
166
+ "eval_recall": 1.0,
167
+ "eval_runtime": 99.4766,
168
+ "eval_samples_per_second": 33.847,
169
+ "eval_steps_per_second": 2.121,
170
+ "step": 800
171
+ },
172
+ {
173
+ "epoch": 0.8975712777191129,
174
+ "grad_norm": 0.02354113757610321,
175
+ "learning_rate": 2.5863889703725433e-05,
176
+ "loss": 0.0058,
177
+ "step": 850
178
+ },
179
+ {
180
+ "epoch": 0.9503695881731784,
181
+ "grad_norm": 0.02310008369386196,
182
+ "learning_rate": 2.5423877970079204e-05,
183
+ "loss": 0.0056,
184
+ "step": 900
185
+ },
186
+ {
187
+ "epoch": 1.0031678986272439,
188
+ "grad_norm": 0.019235238432884216,
189
+ "learning_rate": 2.498386623643297e-05,
190
+ "loss": 0.0106,
191
+ "step": 950
192
+ },
193
+ {
194
+ "epoch": 1.0559662090813093,
195
+ "grad_norm": 208.1472930908203,
196
+ "learning_rate": 2.454385450278674e-05,
197
+ "loss": 0.0124,
198
+ "step": 1000
199
+ },
200
+ {
201
+ "epoch": 1.0559662090813093,
202
+ "eval_accuracy": 0.9967329967329968,
203
+ "eval_f1": 0.9968069666182874,
204
+ "eval_loss": 0.016226448118686676,
205
+ "eval_precision": 0.9936342592592593,
206
+ "eval_recall": 1.0,
207
+ "eval_runtime": 88.1246,
208
+ "eval_samples_per_second": 38.207,
209
+ "eval_steps_per_second": 2.394,
210
+ "step": 1000
211
+ },
212
+ {
213
+ "epoch": 1.1087645195353748,
214
+ "grad_norm": 0.01713796705007553,
215
+ "learning_rate": 2.4103842769140512e-05,
216
+ "loss": 0.0012,
217
+ "step": 1050
218
+ },
219
+ {
220
+ "epoch": 1.1615628299894403,
221
+ "grad_norm": 0.015752457082271576,
222
+ "learning_rate": 2.366383103549428e-05,
223
+ "loss": 0.001,
224
+ "step": 1100
225
+ },
226
+ {
227
+ "epoch": 1.2143611404435057,
228
+ "grad_norm": 0.014375035651028156,
229
+ "learning_rate": 2.322381930184805e-05,
230
+ "loss": 0.0054,
231
+ "step": 1150
232
+ },
233
+ {
234
+ "epoch": 1.2671594508975712,
235
+ "grad_norm": 0.014459795318543911,
236
+ "learning_rate": 2.278380756820182e-05,
237
+ "loss": 0.0009,
238
+ "step": 1200
239
+ },
240
+ {
241
+ "epoch": 1.2671594508975712,
242
+ "eval_accuracy": 0.9997029997029997,
243
+ "eval_f1": 0.9997088791848617,
244
+ "eval_loss": 0.0029172594659030437,
245
+ "eval_precision": 0.9994179278230501,
246
+ "eval_recall": 1.0,
247
+ "eval_runtime": 70.0694,
248
+ "eval_samples_per_second": 48.052,
249
+ "eval_steps_per_second": 3.011,
250
+ "step": 1200
251
+ }
252
+ ],
253
+ "logging_steps": 50,
254
+ "max_steps": 3788,
255
+ "num_input_tokens_seen": 0,
256
+ "num_train_epochs": 4,
257
+ "save_steps": 200,
258
+ "stateful_callbacks": {
259
+ "EarlyStoppingCallback": {
260
+ "args": {
261
+ "early_stopping_patience": 5,
262
+ "early_stopping_threshold": 0.0005
263
+ },
264
+ "attributes": {
265
+ "early_stopping_patience_counter": 1
266
+ }
267
+ },
268
+ "TrainerControl": {
269
+ "args": {
270
+ "should_epoch_stop": false,
271
+ "should_evaluate": false,
272
+ "should_log": false,
273
+ "should_save": true,
274
+ "should_training_stop": false
275
+ },
276
+ "attributes": {}
277
+ }
278
+ },
279
+ "total_flos": 275258541014016.0,
280
+ "train_batch_size": 32,
281
+ "trial_name": null,
282
+ "trial_params": null
283
+ }
checkpoint-1200/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f6ee5be6db1ab816abaa77671d6a299c7d2015f383c82c395377bcfdce9d1cd
3
+ size 5713
checkpoint-1200/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1400/config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "cell": {},
7
+ "classifier_dropout": null,
8
+ "emb_size": 312,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 312,
12
+ "id2label": {
13
+ "0": "ham",
14
+ "1": "spam"
15
+ },
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 1200,
18
+ "label2id": {
19
+ "ham": 0,
20
+ "spam": 1
21
+ },
22
+ "layer_norm_eps": 1e-12,
23
+ "max_position_embeddings": 512,
24
+ "model_type": "bert",
25
+ "num_attention_heads": 12,
26
+ "num_hidden_layers": 4,
27
+ "pad_token_id": 0,
28
+ "position_embedding_type": "absolute",
29
+ "pre_trained": "",
30
+ "structure": [],
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.55.0",
33
+ "type_vocab_size": 2,
34
+ "use_cache": true,
35
+ "vocab_size": 30522
36
+ }
checkpoint-1400/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afa8912f674d23e64c0e45a67afffeed25fcbce273fdc8906b4d241b5954833d
3
+ size 57411808
checkpoint-1400/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f741ecb400556d723d2d0a88f1211dbdb7b894c40a68fe2ebed38b0c194fae7d
3
+ size 114864267
checkpoint-1400/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:abef661c396d70ac459e7ba6608cac0365c9ffeb4d0e6dada67d33354417afdd
3
+ size 14455
checkpoint-1400/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:484087520d4e9e312982d2bfa6b0a475c2703200e2eb36812500de0a364fed26
3
+ size 1465
checkpoint-1400/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-1400/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1400/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-1400/trainer_state.json ADDED
@@ -0,0 +1,323 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1000,
3
+ "best_metric": 0.9968069666182874,
4
+ "best_model_checkpoint": "models/tinybert-enron\\checkpoint-1000",
5
+ "epoch": 1.478352692713833,
6
+ "eval_steps": 200,
7
+ "global_step": 1400,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.05279831045406547,
14
+ "grad_norm": 0.7054851055145264,
15
+ "learning_rate": 3.878627968337731e-06,
16
+ "loss": 0.6916,
17
+ "step": 50
18
+ },
19
+ {
20
+ "epoch": 0.10559662090813093,
21
+ "grad_norm": 4.758987903594971,
22
+ "learning_rate": 7.836411609498681e-06,
23
+ "loss": 0.6419,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.1583949313621964,
28
+ "grad_norm": 2.878147602081299,
29
+ "learning_rate": 1.179419525065963e-05,
30
+ "loss": 0.4382,
31
+ "step": 150
32
+ },
33
+ {
34
+ "epoch": 0.21119324181626187,
35
+ "grad_norm": 0.9188410639762878,
36
+ "learning_rate": 1.575197889182058e-05,
37
+ "loss": 0.2162,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.21119324181626187,
42
+ "eval_accuracy": 0.997920997920998,
43
+ "eval_f1": 0.997965707643127,
44
+ "eval_loss": 0.11409526318311691,
45
+ "eval_precision": 0.9959396751740139,
46
+ "eval_recall": 1.0,
47
+ "eval_runtime": 77.9734,
48
+ "eval_samples_per_second": 43.181,
49
+ "eval_steps_per_second": 2.706,
50
+ "step": 200
51
+ },
52
+ {
53
+ "epoch": 0.26399155227032733,
54
+ "grad_norm": 0.42353588342666626,
55
+ "learning_rate": 1.970976253298153e-05,
56
+ "loss": 0.0759,
57
+ "step": 250
58
+ },
59
+ {
60
+ "epoch": 0.3167898627243928,
61
+ "grad_norm": 0.22129850089550018,
62
+ "learning_rate": 2.3667546174142482e-05,
63
+ "loss": 0.0466,
64
+ "step": 300
65
+ },
66
+ {
67
+ "epoch": 0.36958817317845827,
68
+ "grad_norm": 0.14293035864830017,
69
+ "learning_rate": 2.762532981530343e-05,
70
+ "loss": 0.0333,
71
+ "step": 350
72
+ },
73
+ {
74
+ "epoch": 0.42238648363252373,
75
+ "grad_norm": 0.09417829662561417,
76
+ "learning_rate": 2.982399530654151e-05,
77
+ "loss": 0.015,
78
+ "step": 400
79
+ },
80
+ {
81
+ "epoch": 0.42238648363252373,
82
+ "eval_accuracy": 0.9988119988119988,
83
+ "eval_f1": 0.9988365328679465,
84
+ "eval_loss": 0.012017497792840004,
85
+ "eval_precision": 0.9976757699012202,
86
+ "eval_recall": 1.0,
87
+ "eval_runtime": 75.9348,
88
+ "eval_samples_per_second": 44.341,
89
+ "eval_steps_per_second": 2.779,
90
+ "step": 400
91
+ },
92
+ {
93
+ "epoch": 0.4751847940865892,
94
+ "grad_norm": 0.06869468092918396,
95
+ "learning_rate": 2.938398357289528e-05,
96
+ "loss": 0.0143,
97
+ "step": 450
98
+ },
99
+ {
100
+ "epoch": 0.5279831045406547,
101
+ "grad_norm": 0.05820750445127487,
102
+ "learning_rate": 2.8943971839249047e-05,
103
+ "loss": 0.0161,
104
+ "step": 500
105
+ },
106
+ {
107
+ "epoch": 0.5807814149947201,
108
+ "grad_norm": 0.053672004491090775,
109
+ "learning_rate": 2.8503960105602817e-05,
110
+ "loss": 0.0107,
111
+ "step": 550
112
+ },
113
+ {
114
+ "epoch": 0.6335797254487856,
115
+ "grad_norm": 121.33438873291016,
116
+ "learning_rate": 2.8063948371956588e-05,
117
+ "loss": 0.0068,
118
+ "step": 600
119
+ },
120
+ {
121
+ "epoch": 0.6335797254487856,
122
+ "eval_accuracy": 0.9991089991089991,
123
+ "eval_f1": 0.999127145766657,
124
+ "eval_loss": 0.006898785941302776,
125
+ "eval_precision": 0.9982558139534884,
126
+ "eval_recall": 1.0,
127
+ "eval_runtime": 87.7806,
128
+ "eval_samples_per_second": 38.357,
129
+ "eval_steps_per_second": 2.404,
130
+ "step": 600
131
+ },
132
+ {
133
+ "epoch": 0.6863780359028511,
134
+ "grad_norm": 0.2175891101360321,
135
+ "learning_rate": 2.7623936638310355e-05,
136
+ "loss": 0.0123,
137
+ "step": 650
138
+ },
139
+ {
140
+ "epoch": 0.7391763463569165,
141
+ "grad_norm": 0.031244030222296715,
142
+ "learning_rate": 2.7183924904664125e-05,
143
+ "loss": 0.0087,
144
+ "step": 700
145
+ },
146
+ {
147
+ "epoch": 0.791974656810982,
148
+ "grad_norm": 0.028063887730240822,
149
+ "learning_rate": 2.6743913171017896e-05,
150
+ "loss": 0.0122,
151
+ "step": 750
152
+ },
153
+ {
154
+ "epoch": 0.8447729672650475,
155
+ "grad_norm": 0.025719981640577316,
156
+ "learning_rate": 2.6303901437371663e-05,
157
+ "loss": 0.0077,
158
+ "step": 800
159
+ },
160
+ {
161
+ "epoch": 0.8447729672650475,
162
+ "eval_accuracy": 0.9994059994059994,
163
+ "eval_f1": 0.9994179278230501,
164
+ "eval_loss": 0.004702265374362469,
165
+ "eval_precision": 0.9988365328679465,
166
+ "eval_recall": 1.0,
167
+ "eval_runtime": 99.4766,
168
+ "eval_samples_per_second": 33.847,
169
+ "eval_steps_per_second": 2.121,
170
+ "step": 800
171
+ },
172
+ {
173
+ "epoch": 0.8975712777191129,
174
+ "grad_norm": 0.02354113757610321,
175
+ "learning_rate": 2.5863889703725433e-05,
176
+ "loss": 0.0058,
177
+ "step": 850
178
+ },
179
+ {
180
+ "epoch": 0.9503695881731784,
181
+ "grad_norm": 0.02310008369386196,
182
+ "learning_rate": 2.5423877970079204e-05,
183
+ "loss": 0.0056,
184
+ "step": 900
185
+ },
186
+ {
187
+ "epoch": 1.0031678986272439,
188
+ "grad_norm": 0.019235238432884216,
189
+ "learning_rate": 2.498386623643297e-05,
190
+ "loss": 0.0106,
191
+ "step": 950
192
+ },
193
+ {
194
+ "epoch": 1.0559662090813093,
195
+ "grad_norm": 208.1472930908203,
196
+ "learning_rate": 2.454385450278674e-05,
197
+ "loss": 0.0124,
198
+ "step": 1000
199
+ },
200
+ {
201
+ "epoch": 1.0559662090813093,
202
+ "eval_accuracy": 0.9967329967329968,
203
+ "eval_f1": 0.9968069666182874,
204
+ "eval_loss": 0.016226448118686676,
205
+ "eval_precision": 0.9936342592592593,
206
+ "eval_recall": 1.0,
207
+ "eval_runtime": 88.1246,
208
+ "eval_samples_per_second": 38.207,
209
+ "eval_steps_per_second": 2.394,
210
+ "step": 1000
211
+ },
212
+ {
213
+ "epoch": 1.1087645195353748,
214
+ "grad_norm": 0.01713796705007553,
215
+ "learning_rate": 2.4103842769140512e-05,
216
+ "loss": 0.0012,
217
+ "step": 1050
218
+ },
219
+ {
220
+ "epoch": 1.1615628299894403,
221
+ "grad_norm": 0.015752457082271576,
222
+ "learning_rate": 2.366383103549428e-05,
223
+ "loss": 0.001,
224
+ "step": 1100
225
+ },
226
+ {
227
+ "epoch": 1.2143611404435057,
228
+ "grad_norm": 0.014375035651028156,
229
+ "learning_rate": 2.322381930184805e-05,
230
+ "loss": 0.0054,
231
+ "step": 1150
232
+ },
233
+ {
234
+ "epoch": 1.2671594508975712,
235
+ "grad_norm": 0.014459795318543911,
236
+ "learning_rate": 2.278380756820182e-05,
237
+ "loss": 0.0009,
238
+ "step": 1200
239
+ },
240
+ {
241
+ "epoch": 1.2671594508975712,
242
+ "eval_accuracy": 0.9997029997029997,
243
+ "eval_f1": 0.9997088791848617,
244
+ "eval_loss": 0.0029172594659030437,
245
+ "eval_precision": 0.9994179278230501,
246
+ "eval_recall": 1.0,
247
+ "eval_runtime": 70.0694,
248
+ "eval_samples_per_second": 48.052,
249
+ "eval_steps_per_second": 3.011,
250
+ "step": 1200
251
+ },
252
+ {
253
+ "epoch": 1.3199577613516367,
254
+ "grad_norm": 0.012884082272648811,
255
+ "learning_rate": 2.2343795834555587e-05,
256
+ "loss": 0.0008,
257
+ "step": 1250
258
+ },
259
+ {
260
+ "epoch": 1.3727560718057021,
261
+ "grad_norm": 0.013439378701150417,
262
+ "learning_rate": 2.1903784100909357e-05,
263
+ "loss": 0.0127,
264
+ "step": 1300
265
+ },
266
+ {
267
+ "epoch": 1.4255543822597676,
268
+ "grad_norm": 0.014692062512040138,
269
+ "learning_rate": 2.1463772367263128e-05,
270
+ "loss": 0.0101,
271
+ "step": 1350
272
+ },
273
+ {
274
+ "epoch": 1.478352692713833,
275
+ "grad_norm": 0.012950174510478973,
276
+ "learning_rate": 2.10237606336169e-05,
277
+ "loss": 0.0054,
278
+ "step": 1400
279
+ },
280
+ {
281
+ "epoch": 1.478352692713833,
282
+ "eval_accuracy": 0.9997029997029997,
283
+ "eval_f1": 0.9997088791848617,
284
+ "eval_loss": 0.002878110622987151,
285
+ "eval_precision": 0.9994179278230501,
286
+ "eval_recall": 1.0,
287
+ "eval_runtime": 82.6217,
288
+ "eval_samples_per_second": 40.752,
289
+ "eval_steps_per_second": 2.554,
290
+ "step": 1400
291
+ }
292
+ ],
293
+ "logging_steps": 50,
294
+ "max_steps": 3788,
295
+ "num_input_tokens_seen": 0,
296
+ "num_train_epochs": 4,
297
+ "save_steps": 200,
298
+ "stateful_callbacks": {
299
+ "EarlyStoppingCallback": {
300
+ "args": {
301
+ "early_stopping_patience": 5,
302
+ "early_stopping_threshold": 0.0005
303
+ },
304
+ "attributes": {
305
+ "early_stopping_patience_counter": 2
306
+ }
307
+ },
308
+ "TrainerControl": {
309
+ "args": {
310
+ "should_epoch_stop": false,
311
+ "should_evaluate": false,
312
+ "should_log": false,
313
+ "should_save": true,
314
+ "should_training_stop": false
315
+ },
316
+ "attributes": {}
317
+ }
318
+ },
319
+ "total_flos": 321143328930816.0,
320
+ "train_batch_size": 32,
321
+ "trial_name": null,
322
+ "trial_params": null
323
+ }
checkpoint-1400/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f6ee5be6db1ab816abaa77671d6a299c7d2015f383c82c395377bcfdce9d1cd
3
+ size 5713
checkpoint-1400/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1600/config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "cell": {},
7
+ "classifier_dropout": null,
8
+ "emb_size": 312,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 312,
12
+ "id2label": {
13
+ "0": "ham",
14
+ "1": "spam"
15
+ },
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 1200,
18
+ "label2id": {
19
+ "ham": 0,
20
+ "spam": 1
21
+ },
22
+ "layer_norm_eps": 1e-12,
23
+ "max_position_embeddings": 512,
24
+ "model_type": "bert",
25
+ "num_attention_heads": 12,
26
+ "num_hidden_layers": 4,
27
+ "pad_token_id": 0,
28
+ "position_embedding_type": "absolute",
29
+ "pre_trained": "",
30
+ "structure": [],
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.55.0",
33
+ "type_vocab_size": 2,
34
+ "use_cache": true,
35
+ "vocab_size": 30522
36
+ }
checkpoint-1600/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2cf7c7dc0c5367beaf50e6e894a369c265343db0b965e64596e70e2c3dbc8811
3
+ size 57411808
checkpoint-1600/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7de1b059ed6bbe31af0fe288bce87f379dfe5eb52d6a06f069975e78a8e22e63
3
+ size 114864267
checkpoint-1600/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:281de45a32b26fc7fabb959de798e1b3fd70e15a16525a06643ef973b8c12bfa
3
+ size 14455
checkpoint-1600/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:825169d8720681852b22e97bf6d4c178b22717216530e5ce830bdd78c7efdd37
3
+ size 1465
checkpoint-1600/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
checkpoint-1600/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1600/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
checkpoint-1600/trainer_state.json ADDED
@@ -0,0 +1,363 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1000,
3
+ "best_metric": 0.9968069666182874,
4
+ "best_model_checkpoint": "models/tinybert-enron\\checkpoint-1000",
5
+ "epoch": 1.689545934530095,
6
+ "eval_steps": 200,
7
+ "global_step": 1600,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.05279831045406547,
14
+ "grad_norm": 0.7054851055145264,
15
+ "learning_rate": 3.878627968337731e-06,
16
+ "loss": 0.6916,
17
+ "step": 50
18
+ },
19
+ {
20
+ "epoch": 0.10559662090813093,
21
+ "grad_norm": 4.758987903594971,
22
+ "learning_rate": 7.836411609498681e-06,
23
+ "loss": 0.6419,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.1583949313621964,
28
+ "grad_norm": 2.878147602081299,
29
+ "learning_rate": 1.179419525065963e-05,
30
+ "loss": 0.4382,
31
+ "step": 150
32
+ },
33
+ {
34
+ "epoch": 0.21119324181626187,
35
+ "grad_norm": 0.9188410639762878,
36
+ "learning_rate": 1.575197889182058e-05,
37
+ "loss": 0.2162,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.21119324181626187,
42
+ "eval_accuracy": 0.997920997920998,
43
+ "eval_f1": 0.997965707643127,
44
+ "eval_loss": 0.11409526318311691,
45
+ "eval_precision": 0.9959396751740139,
46
+ "eval_recall": 1.0,
47
+ "eval_runtime": 77.9734,
48
+ "eval_samples_per_second": 43.181,
49
+ "eval_steps_per_second": 2.706,
50
+ "step": 200
51
+ },
52
+ {
53
+ "epoch": 0.26399155227032733,
54
+ "grad_norm": 0.42353588342666626,
55
+ "learning_rate": 1.970976253298153e-05,
56
+ "loss": 0.0759,
57
+ "step": 250
58
+ },
59
+ {
60
+ "epoch": 0.3167898627243928,
61
+ "grad_norm": 0.22129850089550018,
62
+ "learning_rate": 2.3667546174142482e-05,
63
+ "loss": 0.0466,
64
+ "step": 300
65
+ },
66
+ {
67
+ "epoch": 0.36958817317845827,
68
+ "grad_norm": 0.14293035864830017,
69
+ "learning_rate": 2.762532981530343e-05,
70
+ "loss": 0.0333,
71
+ "step": 350
72
+ },
73
+ {
74
+ "epoch": 0.42238648363252373,
75
+ "grad_norm": 0.09417829662561417,
76
+ "learning_rate": 2.982399530654151e-05,
77
+ "loss": 0.015,
78
+ "step": 400
79
+ },
80
+ {
81
+ "epoch": 0.42238648363252373,
82
+ "eval_accuracy": 0.9988119988119988,
83
+ "eval_f1": 0.9988365328679465,
84
+ "eval_loss": 0.012017497792840004,
85
+ "eval_precision": 0.9976757699012202,
86
+ "eval_recall": 1.0,
87
+ "eval_runtime": 75.9348,
88
+ "eval_samples_per_second": 44.341,
89
+ "eval_steps_per_second": 2.779,
90
+ "step": 400
91
+ },
92
+ {
93
+ "epoch": 0.4751847940865892,
94
+ "grad_norm": 0.06869468092918396,
95
+ "learning_rate": 2.938398357289528e-05,
96
+ "loss": 0.0143,
97
+ "step": 450
98
+ },
99
+ {
100
+ "epoch": 0.5279831045406547,
101
+ "grad_norm": 0.05820750445127487,
102
+ "learning_rate": 2.8943971839249047e-05,
103
+ "loss": 0.0161,
104
+ "step": 500
105
+ },
106
+ {
107
+ "epoch": 0.5807814149947201,
108
+ "grad_norm": 0.053672004491090775,
109
+ "learning_rate": 2.8503960105602817e-05,
110
+ "loss": 0.0107,
111
+ "step": 550
112
+ },
113
+ {
114
+ "epoch": 0.6335797254487856,
115
+ "grad_norm": 121.33438873291016,
116
+ "learning_rate": 2.8063948371956588e-05,
117
+ "loss": 0.0068,
118
+ "step": 600
119
+ },
120
+ {
121
+ "epoch": 0.6335797254487856,
122
+ "eval_accuracy": 0.9991089991089991,
123
+ "eval_f1": 0.999127145766657,
124
+ "eval_loss": 0.006898785941302776,
125
+ "eval_precision": 0.9982558139534884,
126
+ "eval_recall": 1.0,
127
+ "eval_runtime": 87.7806,
128
+ "eval_samples_per_second": 38.357,
129
+ "eval_steps_per_second": 2.404,
130
+ "step": 600
131
+ },
132
+ {
133
+ "epoch": 0.6863780359028511,
134
+ "grad_norm": 0.2175891101360321,
135
+ "learning_rate": 2.7623936638310355e-05,
136
+ "loss": 0.0123,
137
+ "step": 650
138
+ },
139
+ {
140
+ "epoch": 0.7391763463569165,
141
+ "grad_norm": 0.031244030222296715,
142
+ "learning_rate": 2.7183924904664125e-05,
143
+ "loss": 0.0087,
144
+ "step": 700
145
+ },
146
+ {
147
+ "epoch": 0.791974656810982,
148
+ "grad_norm": 0.028063887730240822,
149
+ "learning_rate": 2.6743913171017896e-05,
150
+ "loss": 0.0122,
151
+ "step": 750
152
+ },
153
+ {
154
+ "epoch": 0.8447729672650475,
155
+ "grad_norm": 0.025719981640577316,
156
+ "learning_rate": 2.6303901437371663e-05,
157
+ "loss": 0.0077,
158
+ "step": 800
159
+ },
160
+ {
161
+ "epoch": 0.8447729672650475,
162
+ "eval_accuracy": 0.9994059994059994,
163
+ "eval_f1": 0.9994179278230501,
164
+ "eval_loss": 0.004702265374362469,
165
+ "eval_precision": 0.9988365328679465,
166
+ "eval_recall": 1.0,
167
+ "eval_runtime": 99.4766,
168
+ "eval_samples_per_second": 33.847,
169
+ "eval_steps_per_second": 2.121,
170
+ "step": 800
171
+ },
172
+ {
173
+ "epoch": 0.8975712777191129,
174
+ "grad_norm": 0.02354113757610321,
175
+ "learning_rate": 2.5863889703725433e-05,
176
+ "loss": 0.0058,
177
+ "step": 850
178
+ },
179
+ {
180
+ "epoch": 0.9503695881731784,
181
+ "grad_norm": 0.02310008369386196,
182
+ "learning_rate": 2.5423877970079204e-05,
183
+ "loss": 0.0056,
184
+ "step": 900
185
+ },
186
+ {
187
+ "epoch": 1.0031678986272439,
188
+ "grad_norm": 0.019235238432884216,
189
+ "learning_rate": 2.498386623643297e-05,
190
+ "loss": 0.0106,
191
+ "step": 950
192
+ },
193
+ {
194
+ "epoch": 1.0559662090813093,
195
+ "grad_norm": 208.1472930908203,
196
+ "learning_rate": 2.454385450278674e-05,
197
+ "loss": 0.0124,
198
+ "step": 1000
199
+ },
200
+ {
201
+ "epoch": 1.0559662090813093,
202
+ "eval_accuracy": 0.9967329967329968,
203
+ "eval_f1": 0.9968069666182874,
204
+ "eval_loss": 0.016226448118686676,
205
+ "eval_precision": 0.9936342592592593,
206
+ "eval_recall": 1.0,
207
+ "eval_runtime": 88.1246,
208
+ "eval_samples_per_second": 38.207,
209
+ "eval_steps_per_second": 2.394,
210
+ "step": 1000
211
+ },
212
+ {
213
+ "epoch": 1.1087645195353748,
214
+ "grad_norm": 0.01713796705007553,
215
+ "learning_rate": 2.4103842769140512e-05,
216
+ "loss": 0.0012,
217
+ "step": 1050
218
+ },
219
+ {
220
+ "epoch": 1.1615628299894403,
221
+ "grad_norm": 0.015752457082271576,
222
+ "learning_rate": 2.366383103549428e-05,
223
+ "loss": 0.001,
224
+ "step": 1100
225
+ },
226
+ {
227
+ "epoch": 1.2143611404435057,
228
+ "grad_norm": 0.014375035651028156,
229
+ "learning_rate": 2.322381930184805e-05,
230
+ "loss": 0.0054,
231
+ "step": 1150
232
+ },
233
+ {
234
+ "epoch": 1.2671594508975712,
235
+ "grad_norm": 0.014459795318543911,
236
+ "learning_rate": 2.278380756820182e-05,
237
+ "loss": 0.0009,
238
+ "step": 1200
239
+ },
240
+ {
241
+ "epoch": 1.2671594508975712,
242
+ "eval_accuracy": 0.9997029997029997,
243
+ "eval_f1": 0.9997088791848617,
244
+ "eval_loss": 0.0029172594659030437,
245
+ "eval_precision": 0.9994179278230501,
246
+ "eval_recall": 1.0,
247
+ "eval_runtime": 70.0694,
248
+ "eval_samples_per_second": 48.052,
249
+ "eval_steps_per_second": 3.011,
250
+ "step": 1200
251
+ },
252
+ {
253
+ "epoch": 1.3199577613516367,
254
+ "grad_norm": 0.012884082272648811,
255
+ "learning_rate": 2.2343795834555587e-05,
256
+ "loss": 0.0008,
257
+ "step": 1250
258
+ },
259
+ {
260
+ "epoch": 1.3727560718057021,
261
+ "grad_norm": 0.013439378701150417,
262
+ "learning_rate": 2.1903784100909357e-05,
263
+ "loss": 0.0127,
264
+ "step": 1300
265
+ },
266
+ {
267
+ "epoch": 1.4255543822597676,
268
+ "grad_norm": 0.014692062512040138,
269
+ "learning_rate": 2.1463772367263128e-05,
270
+ "loss": 0.0101,
271
+ "step": 1350
272
+ },
273
+ {
274
+ "epoch": 1.478352692713833,
275
+ "grad_norm": 0.012950174510478973,
276
+ "learning_rate": 2.10237606336169e-05,
277
+ "loss": 0.0054,
278
+ "step": 1400
279
+ },
280
+ {
281
+ "epoch": 1.478352692713833,
282
+ "eval_accuracy": 0.9997029997029997,
283
+ "eval_f1": 0.9997088791848617,
284
+ "eval_loss": 0.002878110622987151,
285
+ "eval_precision": 0.9994179278230501,
286
+ "eval_recall": 1.0,
287
+ "eval_runtime": 82.6217,
288
+ "eval_samples_per_second": 40.752,
289
+ "eval_steps_per_second": 2.554,
290
+ "step": 1400
291
+ },
292
+ {
293
+ "epoch": 1.5311510031678988,
294
+ "grad_norm": 0.01537719089537859,
295
+ "learning_rate": 2.0583748899970665e-05,
296
+ "loss": 0.0053,
297
+ "step": 1450
298
+ },
299
+ {
300
+ "epoch": 1.583949313621964,
301
+ "grad_norm": 0.01542325783520937,
302
+ "learning_rate": 2.0143737166324436e-05,
303
+ "loss": 0.0053,
304
+ "step": 1500
305
+ },
306
+ {
307
+ "epoch": 1.6367476240760297,
308
+ "grad_norm": 0.012001622468233109,
309
+ "learning_rate": 1.9703725432678206e-05,
310
+ "loss": 0.0061,
311
+ "step": 1550
312
+ },
313
+ {
314
+ "epoch": 1.689545934530095,
315
+ "grad_norm": 0.015623683109879494,
316
+ "learning_rate": 1.9263713699031974e-05,
317
+ "loss": 0.0007,
318
+ "step": 1600
319
+ },
320
+ {
321
+ "epoch": 1.689545934530095,
322
+ "eval_accuracy": 0.9994059994059994,
323
+ "eval_f1": 0.9994179278230501,
324
+ "eval_loss": 0.003968308679759502,
325
+ "eval_precision": 0.9988365328679465,
326
+ "eval_recall": 1.0,
327
+ "eval_runtime": 73.337,
328
+ "eval_samples_per_second": 45.911,
329
+ "eval_steps_per_second": 2.877,
330
+ "step": 1600
331
+ }
332
+ ],
333
+ "logging_steps": 50,
334
+ "max_steps": 3788,
335
+ "num_input_tokens_seen": 0,
336
+ "num_train_epochs": 4,
337
+ "save_steps": 200,
338
+ "stateful_callbacks": {
339
+ "EarlyStoppingCallback": {
340
+ "args": {
341
+ "early_stopping_patience": 5,
342
+ "early_stopping_threshold": 0.0005
343
+ },
344
+ "attributes": {
345
+ "early_stopping_patience_counter": 3
346
+ }
347
+ },
348
+ "TrainerControl": {
349
+ "args": {
350
+ "should_epoch_stop": false,
351
+ "should_evaluate": false,
352
+ "should_log": false,
353
+ "should_save": true,
354
+ "should_training_stop": false
355
+ },
356
+ "attributes": {}
357
+ }
358
+ },
359
+ "total_flos": 367028116847616.0,
360
+ "train_batch_size": 32,
361
+ "trial_name": null,
362
+ "trial_params": null
363
+ }
checkpoint-1600/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f6ee5be6db1ab816abaa77671d6a299c7d2015f383c82c395377bcfdce9d1cd
3
+ size 5713
checkpoint-1600/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1800/config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "cell": {},
7
+ "classifier_dropout": null,
8
+ "emb_size": 312,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 312,
12
+ "id2label": {
13
+ "0": "ham",
14
+ "1": "spam"
15
+ },
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 1200,
18
+ "label2id": {
19
+ "ham": 0,
20
+ "spam": 1
21
+ },
22
+ "layer_norm_eps": 1e-12,
23
+ "max_position_embeddings": 512,
24
+ "model_type": "bert",
25
+ "num_attention_heads": 12,
26
+ "num_hidden_layers": 4,
27
+ "pad_token_id": 0,
28
+ "position_embedding_type": "absolute",
29
+ "pre_trained": "",
30
+ "structure": [],
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.55.0",
33
+ "type_vocab_size": 2,
34
+ "use_cache": true,
35
+ "vocab_size": 30522
36
+ }
checkpoint-1800/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8f5d4180455ea736ab182def6b38b00dc107672bfcbbbec35ebd8f8f1eb0fd4
3
+ size 57411808
checkpoint-1800/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d8b70853056ab19a8af77db7451a979f3b67ebf653055a81e61b9af9dbbfe47
3
+ size 114864267
checkpoint-1800/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21ee00a18c8ad610e6aa84c09f68969220cd60698fda59a843c30c511d648944
3
+ size 14455
checkpoint-1800/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa5df7cb9434da5febf4c571f7669ce12f83e4a51705d45719fff294ceb28256
3
+ size 1465