Roberta-base trained with linearly increasing alpha for alpha-entmax (from 1.0 to 2.0).

To run, do this:
```python
from sparse_roberta import get_custom_model

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained('roberta-base')

# Load the model
model = get_custom_model(
    'mtreviso/sparsemax-roberta',
    initial_alpha=2.0,
    use_triton_entmax=False,
    from_scratch=False,
)
```

To run glue tasks, you can use the `run_glue.py` script. For example:
```
python run_glue.py \
  --model_name_or_path mtreviso/sparsemax-roberta \
  --config_name roberta-base \
  --tokenizer_name roberta-base \
  --task_name rte \
  --output_dir output-rte \
  --do_train \
  --do_eval \
  --max_seq_length 512 \
  --per_device_train_batch_size 32 \
  --learning_rate 3e-5 \
  --num_train_epochs 3 \
  --save_steps 1000 \
  --logging_steps 100 \
  --save_total_limit 1 \
  --overwrite_output_dir
```