TermBERT
Collection
2 items
•
Updated
Fine-tuned EuroBERT-610M on balanced data for distribution-robust German IR term sense disambiguation.
This is a distribution-robust variant of T-EBERT, trained on artificially balanced data (50% IR sense, 50% colloquial) to learn semantic features without exploiting class frequency priors.
| Aspect | T-EBERT (Standard) | T-EBERT-Balanced |
|---|---|---|
| Training distribution | 70/30 (natural) | 50/50 (balanced) |
| F1 Score | 0.922 | 0.822 |
| Optimization goal | Peak performance | Distribution robustness |
| Feature reliance | Content + distribution | Semantic only |
| Best for | Production (known distribution) | Research (unknown distribution) |
| Term | F1 Score |
|---|---|
| Kooperation | 0.934 |
| Entspannung | 0.916 |
| Intervention | 0.901 |
| Integration | 0.895 |
| Norm | 0.699 |
| Regime | 0.390 |
Primary use cases:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "pdjohn/T-EBERT-term-sense-german"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Classify a sentence
sentence = "Die internationale Norm verbietet den Einsatz von Gewalt."
inputs = tokenizer(sentence, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=1).item()
# Interpretation
labels = {0: "Colloquial", 1: "IR Sense"}
print(f"Prediction: {labels[prediction]}")
bias="all") - critical for balanced learningThis work was conducted as part of the Tracing International Institutions and Behavior (TIIB) project at TU Darmstadt.
Affiliation: Chair of Transnational Governance, Department of Political Science, Technische Universität Darmstadt
We thank the TIIB project team for their support and the manual annotation of the training corpus.
Base model
EuroBERT/EuroBERT-610m