T-EBERT: German IR Term Sense Disambiguation

Fine-tuned EuroBERT-610M on balanced data for distribution-robust German IR term sense disambiguation.

Model Description

This is a distribution-robust variant of T-EBERT, trained on artificially balanced data (50% IR sense, 50% colloquial) to learn semantic features without exploiting class frequency priors.

Key Differences from Standard T-EBERT

Aspect T-EBERT (Standard) T-EBERT-Balanced
Training distribution 70/30 (natural) 50/50 (balanced)
F1 Score 0.922 0.822
Optimization goal Peak performance Distribution robustness
Feature reliance Content + distribution Semantic only
Best for Production (known distribution) Research (unknown distribution)

Performance

  • F1 Score: 0.822 on balanced test set
  • Training distribution: 50% IR sense, 50% colloquial (artificially balanced)
Term F1 Score
Kooperation 0.934
Entspannung 0.916
Intervention 0.901
Integration 0.895
Norm 0.699
Regime 0.390

Intended Use

Primary use cases:

  • Research on distribution robustness in NLP
  • Corpora with unknown or varying class distributions
  • Domain transfer scenarios
  • Applications requiring symmetric per-class performance
  • Benchmarking against distribution-agnostic baselines

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "pdjohn/T-EBERT-term-sense-german"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Classify a sentence
sentence = "Die internationale Norm verbietet den Einsatz von Gewalt."
inputs = tokenizer(sentence, return_tensors="pt", truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()

# Interpretation
labels = {0: "Colloquial", 1: "IR Sense"}
print(f"Prediction: {labels[prediction]}")

Training Details

Training Data

  • Corpus: Same as T-EBERT, but with balanced sampling
  • Size: 456 sentences (training) + 114 sentences (test)
  • Distribution: 50% IR sense, 50% colloquial (artificially balanced via undersampling)
  • Terms: Norm, Kooperation, Regime, Integration, Intervention, Entspannung

Training Procedure

  • Base model: EuroBERT-610M
  • Fine-tuning method: LoRA (Low-Rank Adaptation)
    • Rank (r): 8
    • Alpha: 16
    • Target modules: q_proj, k_proj, v_proj
    • Bias training: Enabled (bias="all") - critical for balanced learning
    • Dropout: 0.05
  • Epochs: 5
  • Batch size: 8
  • Learning rate: 1e-4

Limitations

  • Lower peak performance: 0.82 vs 0.92 F1
  • Smaller training set: 456 vs 934 samples (undersampling)
  • see T-EBERT for best performance

Acknowledgements

This work was conducted as part of the Tracing International Institutions and Behavior (TIIB) project at TU Darmstadt.

Affiliation: Chair of Transnational Governance, Department of Political Science, Technische Universität Darmstadt

We thank the TIIB project team for their support and the manual annotation of the training corpus.

Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pdjohn/EuroBERT-610m-term-sense-balanced

Finetuned
(17)
this model

Collection including pdjohn/EuroBERT-610m-term-sense-balanced

Evaluation results