You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

gpt2moe_het2_100mb

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 7404
training_steps: 74047
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
No log	0	0	11.0993
7.3615	0.2701	2000	7.0780
6.47	0.5402	4000	6.1761
5.9733	0.8103	6000	5.6818
5.6141	1.0804	8000	5.3557
5.4025	1.3504	10000	5.1035
5.2257	1.6205	12000	4.9233
5.1025	1.8906	14000	4.8059
4.9549	2.1607	16000	4.7256
4.9029	2.4308	18000	4.6622
4.8632	2.7009	20000	4.6041
4.8398	2.9710	22000	4.5621
4.8182	3.0	22215	4.5559
4.6986	3.2411	24000	4.5316
4.6922	3.5111	26000	4.5008
4.6772	3.7812	28000	4.4704
4.5482	4.0513	30000	4.4513
4.557	4.3214	32000	4.4324
4.5699	4.5915	34000	4.4124
4.5508	4.8616	36000	4.3927
4.4423	5.1317	38000	4.3846
4.4491	5.4018	40000	4.3701
4.4602	5.6718	42000	4.3575
4.4429	5.9419	44000	4.3404
4.353	6.2120	46000	4.3403
4.3662	6.4821	48000	4.3306
4.3708	6.7522	50000	4.3197
4.29	7.0223	52000	4.3150
4.2882	7.2924	54000	4.3123
4.2945	7.5625	56000	4.3045
4.3034	7.8325	58000	4.2956
4.2248	8.1026	60000	4.2956
4.2257	8.3727	62000	4.2925
4.2318	8.6428	64000	4.2852
4.236	8.9129	66000	4.2798
4.1746	9.1830	68000	4.2823
4.1798	9.4531	70000	4.2792
4.1827	9.7232	72000	4.2759
4.1743	9.9932	74000	4.2740

Safetensors

Model size

0.2B params

Tensor type

F32