GoEmotions German — xlm-roberta-base (ONNX INT8)

28-emotion sentiment classifier for German text, fine-tuned on GoEmotions dataset translated to German.

Model Details

  • Base model: FacebookAI/xlm-roberta-base (270M params)
  • Training data: GoEmotions EN+DE bilingual (87K examples)
  • Translation: Helsinki-NLP/opus-mt-en-de (English → German)
  • Format: ONNX INT8 quantized (266MB)
  • Tokenizer: SentencePiece (tokenizer.json, Unigram)

Performance

Metric Score
F1 Macro (validation) 0.399
Top-1 accuracy (28×3 test) 64%
Top-3 accuracy (28×3 test) 77%
Sanity checks 8/8
Perfect emotions 11/28
Missed emotions 3/28 (grief, pride, relief)

Labels (28 GoEmotions)

admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise, neutral

Usage (Python + ONNX Runtime)

import onnxruntime as ort
import numpy as np
from tokenizers import Tokenizer

session = ort.InferenceSession("goemotions.onnx")
tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.enable_padding(length=128, pad_id=1, pad_token="<pad>")
tokenizer.enable_truncation(max_length=128)

text = "Ich bin heute so glücklich und dankbar"
enc = tokenizer.encode(text)
input_ids = np.array([enc.ids], dtype=np.int64)
attention_mask = np.array([enc.attention_mask], dtype=np.int64)

logits = session.run(None, {"input_ids": input_ids, "attention_mask": attention_mask})[0][0]
probs = np.exp(logits - logits.max()) / np.exp(logits - logits.max()).sum()
top_idx = probs.argmax()

Training Details

  • Epochs: 6 (stopped early due to disk — F1 was still climbing)
  • Learning rate: 1e-5
  • Batch size: 32
  • Warmup: 10%
  • Optimizer: AdamW (weight_decay=0.01)
  • Hardware: RTX 4000 Ada (RunPod)
  • Training time: ~75 minutes

Limitations

  • Rare emotions (grief, pride, relief) have very few examples in GoEmotions — accuracy is low for these regardless of training
  • Translations via MarianMT may miss cultural nuances in German emotional expression
  • ONNX only — PyTorch checkpoint not included (contact for reproduction details)

Citation

If you use this model, please cite GoEmotions:

@inproceedings{demszky2020goemotions,
  title={GoEmotions: A Dataset of Fine-Grained Emotions},
  author={Demszky, Dorottya and others},
  booktitle={ACL},
  year={2020}
}

Author

Built by tojohere for SentiLog — an AI-powered journaling app.

Downloads last month
6
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train tojohere/goemotions-de-xlm-roberta-base