Update README.md
Browse files
README.md
CHANGED
|
@@ -5,8 +5,7 @@ Test network using differential attention instead of classical attention (using
|
|
| 5 |
- `test_train.py` runs with the exact configurations used to train this model and is the reproduction script. Data is assumed to be in JSONL format with `"text":"example text", "text":"..."`
|
| 6 |
|
| 7 |
# Notes:
|
| 8 |
-
|
| 9 |
-
|
| 10 |
|
| 11 |
# Training Metrics
|
| 12 |
|
|
|
|
| 5 |
- `test_train.py` runs with the exact configurations used to train this model and is the reproduction script. Data is assumed to be in JSONL format with `"text":"example text", "text":"..."`
|
| 6 |
|
| 7 |
# Notes:
|
| 8 |
+
Appears to be very competent, learned significantly faster than the GQA control. Achieved as slightly better min loss.
|
|
|
|
| 9 |
|
| 10 |
# Training Metrics
|
| 11 |
|