Roberta-base trained with linearly increasing alpha for alpha-entmax (from 1.0 to 2.0). To run, do this: ```python from sparse_roberta import get_custom_model # Load tokenizer tokenizer = AutoTokenizer.from_pretrained('roberta-base') # Load the model model = get_custom_model( 'mtreviso/sparsemax-roberta', initial_alpha=2.0, use_triton_entmax=False, from_scratch=False, ) ``` To run glue tasks, you can use the `run_glue.py` script. For example: ``` python run_glue.py \ --model_name_or_path mtreviso/sparsemax-roberta \ --config_name roberta-base \ --tokenizer_name roberta-base \ --task_name rte \ --output_dir output-rte \ --do_train \ --do_eval \ --max_seq_length 512 \ --per_device_train_batch_size 32 \ --learning_rate 3e-5 \ --num_train_epochs 3 \ --save_steps 1000 \ --logging_steps 100 \ --save_total_limit 1 \ --overwrite_output_dir ```