---
identifier: https://huggingface.co/oeg/RoBERTaSense-FACIL
name: RoBERTaSense-FACIL
version: 0.1.0
keywords:
- easy-to-read
- meaning preservation
- accessibility
- spanish
- text pair classification
headline: >-
  Spanish RoBERTa fine-tuned to assess meaning preservation in Easy-to-Read
  (E2R) adaptations.
description: >
  RoBERTaSense-FACIL is a Spanish RoBERTa model fine-tuned to assess meaning
  preservation in Easy-to-Read (E2R) adaptations. Given a pair {original,
  adapted}, it predicts whether the adaptation preserves the meaning of the
  original. ⚠️ Deprecation notice (base model): fine-tuned from
  PlanTL-GOB-ES/roberta-base-bne, which is deprecated as of 2025. For actively
  maintained Spanish RoBERTa models, see BSC-LT.
task:
- Text classification
- Pairwise classification
modelCategory:
- Supervised classification
language:
- es
license: apache-2.0
parameterSize: 125M
developmentStatus: Active
dateCreated: 25-09-2025
dateModified: 06-10-2025
citation: >
  Diab Lozano, I., & Suárez-Figueroa, M. C. (2025). RoBERTaSense-FACIL: Meaning
  Preservation for Easy-to-Read in Spanish. Retrieved from
  https://huggingface.co/oeg/RoBERTaSense-FACIL
codeRepository: ''
referencePublication: ''
developmentLibrary: PyTorch + Transformers
usageInstructions: >
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
  import torch

  repo = "oeg/RoBERTaSense-FACIL" model =
  AutoModelForSequenceClassification.from_pretrained(repo) tokenizer =
  AutoTokenizer.from_pretrained(repo)

  original = "El lobo, que parecía amable, engañó a Caperucita." adapted  = "El
  lobo parecía amable. El lobo engañó a Caperucita."

  inputs = tokenizer(original, adapted, return_tensors="pt", truncation=True,
  max_length=512) with torch.no_grad():
      logits = model(**inputs).logits
  probs = logits.softmax(-1).squeeze().tolist() print({model.config.id2label[i]:
  probs[i] for i in range(len(probs))})
modelRisks:
- Trained for Spanish E2R; out-of-domain performance may degrade.
- >-
  Binary labels compress nuanced cases; borderline adaptations may require human
  review.
- Synthetic negatives do not cover all real-world human errors.
- Base model is deprecated; security/robustness updates will not be inherited.
evaluationMetrics:
- Accuracy
- F1
- ROC-AUC
evaluationResults: |
  80/20 stratified split (seed=42). Example results:
    - Accuracy: 0.81
    - F1: 0.84
    - ROC-AUC: 0.83
softwareRequirements:
- python>=3.9
- torch>=2.0
- transformers>=4.40
- datasets>=2.18
storageRequirements:
- ~500 MB
memoryRequirements:
- >-
  >= 8 GB RAM (CPU inference), >= 12 GB VRAM recommended for large batch
  inference
operatingSystem:
- Linux
- macOS
- Windows
processorRequirements:
- x86_64 CPU (AVX recommended)
GPURequirements:
- >-
  Not required for single-pair inference; CUDA GPU recommended for batch
  processing
distribution:
- encodingFormat: ''
  contentUrl: ''
  contentSize: ''
  quantizationBits: ''
  quantizationMethod: ''
trainedOn:
- identifier: internal:e2r-positives
  name: Expert-validated E2R pairs (Spanish)
  description: >
    Positive pairs (original↔adapted) from an existing corpus validated by
    experts; used as the positive class.
  url: ''
- identifier: internal:synthetic-negatives
  name: Synthetic hard negatives (Spanish)
  description: >
    Negatives generated via sentence shuffle, dropout, mismatch (derangement),
    paraphrase-with-distortion, and zero-shot NLI contradictions; trivial pairs
    filtered by BLEU/ROUGE-L thresholds.
  url: ''
testedOn:
- identifier: internal:heldout-20
  name: Held-out 20% stratified split
  description: >
    Stratified 80/20 split by Label (seed=42); pairwise tokenization up to 512
    tokens.
evaluatedOn:
- identifier: internal:heldout-20
  name: Held-out 20% stratified split
  description: >
    Metrics: Accuracy, F1, ROC-AUC; operating threshold tuned via Youden’s J
    (ROC).
validatedOn: ''
author:
- name: Isam Diab Lozano
  identifier: https://orcid.org/0000-0002-3967-0672
- name: Mari Carmen Suárez-Figueroa
  identifier: https://orcid.org/0000-0003-3807-5019
successorOf: ''
funder:
- name: Comunidad de Madrid — PIPF-2022/COM-25762
  identifier: ''
sharedBy:
- name: Ontology Engineering Group (UPM)
  identifier: https://oeg.fi.upm.es/index.php/en/index.html
wasGeneratedBy:
- trainingRegion:
  - name: Europe (West)
  cloudProvider:
  - name: ''
    url: ''
  duration: ''
  hardwareType: ''
fineTunedFromModel: https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne
sdPublisher:
- name: Ontology Engineering Group
  url: https://oeg.fi.upm.es/index.php/en/index.html
sdLicense: apache-2.0
metrics:
- accuracy
- f1
- roc_auc
base_model:
- PlanTL-GOB-ES/roberta-base-bne
pipeline_tag: text-classification
tags:
- easy-to-read
- meaning-preservation
---

## Model Card for RoBERTaSense-FACIL

**RoBERTaSense-FACIL** (RoBERTa Fine-tuned for Accessible Comprehension In Language) is a Spanish RoBERTa model fine-tuned to assess **meaning preservation** in **Easy-to-Read (E2R)** adaptations. Given a pair of texts {original, adapted}, it predicts whether the adaptation **preserves** the meaning of the original.

⚠️ **Deprecation notice (base model):** This model was fine-tuned from `PlanTL-GOB-ES/roberta-base-bne`. As for September 2025, this checkpoint is **deprecated** and no longer actively maintained. For actively maintained Spanish RoBERTa models, please see the **BSC-LT** organization: https://huggingface.co/BSC-LT

---

## 🚀 How to Use

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

repo = "oeg/RoBERTaSense-FACIL"  
model = AutoModelForSequenceClassification.from_pretrained(repo)
tokenizer = AutoTokenizer.from_pretrained(repo)

original = "El lobo, que parecía amable, engañó a Caperucita."
adapted  = "El lobo parecía amable.
            El lobo engañó a Caperucita."

# Encode the pair (original, adapted)
inputs = tokenizer(original, adapted, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs).logits

probs = logits.softmax(-1).squeeze().tolist()
print({model.config.id2label[i]: probs[i] for i in range(len(probs))})
````

**Suggested labels (adjust to your checkpoint):**

```json
{
  "id2label": {"0": "DOES_NOT_PRESERVE", "1": "PRESERVES_MEANING"},
  "label2id": {"DOES_NOT_PRESERVE": 0, "PRESERVES_MEANING": 1}
}
```

---

## Model Description

* **Developed by:** Ontology Engineering Group (UPM) / Authors: Isam Diab Lozano and Mari Carmen Suárez-Figueroa
* **Funded by:** "Ayudas para la contratación de personal investigador predoctoral en formación para el año 2022" (Reference: PIPF-2022/COM-25762) by Comunidad Autónoma de Madrid (Spain)
* **Model type:** Encoder-only Transformer (RoBERTa) with a classification head
* **Language:** Spanish (es)
* **License:** Apache-2.0
* **Finetuned from model:** `PlanTL-GOB-ES/roberta-base-bne` (deprecated; see notice above)

---

## Uses

### Direct Use

* Automatic scoring of **meaning preservation** for Spanish **Easy-to-Read** adaptations.
* As a signal in content quality checks for accessibility pipelines.

### Out-of-Scope Use

* Clinical, legal, or other high-stakes decisions without human expert oversight.
* Non-Spanish or out-of-domain texts without prior adaptation or re-training.

---

## Bias, Risks, and Limitations

* **Domain limitation:** trained for Spanish E2R; performance may degrade on other genres/domains.
* **Binary labels:** compress nuanced cases; borderline adaptations may require human review.
* **Synthetic negatives:** not all human errors are covered by synthetic negative strategies.
* **Base deprecation:** the upstream base model is deprecated; security/robustness updates won’t be inherited.

### Recommendations

* Calibrate probabilities (e.g., temperature scaling) and expose confidence scores.
* Use threshold tuning (e.g., Youden’s J) to trade precision/recall for your setting.
* Keep a **human-in-the-loop** for critical use cases and periodic error audits.

---

## How to Get Started with the Model

See **How to Use** above. For pairwise inputs, encode as sentence pairs:

```python
inputs = tokenizer(text_original, text_adapted, return_tensors="pt", truncation=True, max_length=512)
```

---

## Training Details

### Training Data

* **Source:** Spanish pairs (*original - adapted*) curated/validated by experts.
* **Columns:** `text1` (original), `text2` (adaptation), `Label` (0/1), `neg_type`.
* **Labels:** `1 = PRESERVES_MEANING`, `0 = DOES_NOT_PRESERVE`.
* **Negative types** used in training data construction: `shuffle`, `dropout`, `mismatch` (derangement), `paraphrase_distortion`, `nli_contradiction`.
* **Split:** 80/20, stratified by `Label` (random_state=42).

### Training Procedure

#### Preprocessing 

* Pair tokenization with truncation at 512 tokens:

```python
tokenizer(text1, text2, truncation=True, max_length=512)
```

#### Training Hyperparameters

* **Training regime:** fp16 mixed precision (if supported; otherwise fp32)
* **Arguments:**

  * `num_train_epochs=5`
  * `per_device_train_batch_size=32`
  * `per_device_eval_batch_size=16`
  * `learning_rate=2e-5`
  * `weight_decay=0.01`
  * `warmup_ratio=0.1`
  * `evaluation_strategy="epoch"`, `save_strategy="epoch"`
  * `load_best_model_at_end=True`, `metric_for_best_model="f1"`
* **Optimizer:** AdamW
* **Loss:** CrossEntropy (2 logits)


## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

* Held-out 20% stratified split of the curated E2R pairs.

#### Factors

* Report per-negative-type breakdown (e.g., performance on `mismatch`, `paraphrase_distortion`, etc.).

#### Metrics

* Accuracy, F1, ROC-AUC.

### Results

  * Accuracy: `0.81`
  * F1: `0.84`
  * ROC-AUC: `0.83`
* Threshold tuned via Youden’s J for operating point selection.

## Technical Specifications

### Model Architecture and Objective

* Encoder-only RoBERTa with a classification head (`Linear(hidden → 2)`).
* Objective: supervised cross-entropy on binary label.

---

## Citation

**BibTeX:**

```bibtex
@software{roberta_facil_2025,
  title  = {RoBERTaSense-FACIL: Meaning Preservation for Easy-to-Read in Spanish},
  author = {Diab Lozano, Isam and Suárez-Figueroa, Mari Carmen},
  year   = {2025},
  url    = {https://huggingface.co/oeg/RoBERTaSense-FACIL}
}
```

**APA:**
Diab Lozano, Isam and Suárez-Figueroa, Mari Carmen. (2025). *RoBERTa-FACIL: Meaning Preservation for Easy-to-Read in Spanish*. Hugging Face. [https://huggingface.co/oeg/RoBERTaSense-FACIL](https://huggingface.co/oeg/RoBERTaSense-FACIL)