ConvBERT-fine-tuned-CHP-Villain (Checkpoint 1782)

This model is a fine-tuned version of dbmdz/convbert-base-turkish-mc4-cased designed to detect "Villain" rhetoric targeting the Republican People's Party (CHP) in Turkish political media.

Model Description

  • Model type: ConvBERT-based binary text classification
  • Language: Turkish
  • Finetuned from model: dbmdz/convbert-base-turkish-mc4-cased
  • Label Mapping:
    • Label 1 (Villain): The sentence employs specific polarizing rhetoric to delegitimize or demonize CHP.
    • Label 0 (Non-Villain): The sentence is neutral, factual, or does not contain the specific rhetoric defined below.

Label Definitions: What is "Villain" Rhetoric?

The model was trained to identify "Villain" rhetoric based on a codebook containing two primary components: Illegitimacy and Immorality.

1. Illegitimacy Component

Sentences that portray CHP as an illegitimate political actor, specifically:

  • Coups & Anti-Democracy: Claims that CHP is upholding the legacy of military coups, has a "coup mindset," or actively supports plans to overthrow the government.
  • Terrorism Links: Claims that CHP cooperates with or supports recognized terrorist organizations (PKK, FETÖ, DHKP-C) or collaborates with the HDP (framed as the "political wing" of the PKK).
  • Anti-National Interest: Claims that CHP works against the "general will" or the strategic interests of the Turkish nation (e.g., in foreign policy or counter-terrorism).
  • Foreign Influence: Implications that CHP cadres are bribed, coerced, or controlled by foreign powers/interests to undermine the state.

2. Immorality Component

Sentences that portray CHP as an immoral political actor, specifically:

  • Lack of Principles: Vilifying opposition members as liars, hypocrites, or morally bankrupt.
  • Corruption: Allegations of corruption, misconduct, or abuse of power (municipal or general).
  • Insults & Slander: Depictions of CHP politicians insulting, threatening, or slandering the public, religious values, or other politicians (including the President).
  • Disrespecting Values: Claims that CHP violates established public morals or religious norms.

Training

The model was trained on Politics/turkey-chp-finetune, a dataset of Turkish news sentences.

  • Labeling Strategy: The training labels were generated using GPT-4o, which was itself fine-tuned on a smaller set of human-labeled data (Politics/turkey-chp-gpt4o).
  • Knowledge Distillation: This ConvBERT model effectively "distills" the classification capabilities of the larger GPT-4o model into a smaller, more efficient architecture suitable for large-scale analysis.
  • Class Imbalance: To address imbalance in the training data, the model utilized sklearn.utils.class_weight.compute_class_weight to calculate balanced weights, which were passed to the CrossEntropyLoss function during training.

Hyperparameters

  • Learning Rate: 5e-5
  • Batch Size: 32 (Train), 64 (Eval)
  • Epochs: 3
  • Warmup Steps: 500
  • Weight Decay: 0.01

Evaluation Results

The model was evaluated on a held-out dataset: Politics/turkey-chp-heldout.

Performance of Checkpoint 1782:

Metric Score Note
Positive Precision 0.802 When model predicts "Villain", it is correct 80.2% of the time.
Positive Recall 0.701 The model catches 70.1% of all actual "Villain" sentences.
Positive F1 0.748 Harmonic mean of precision and recall for the target class.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "Politics/turkey-chp-villain" # Replace with your actual repo name
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "CHP, terör örgütleriyle kol kola yürüyerek ülkeye ihanet ediyor."

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits

predicted_class_id = logits.argmax().item()
print(f"Prediction: {predicted_class_id}")
# Output: 1 (Villain) or 0 (Neutral)

Limitations

  • Error Propagation: The training data utilizes "silver" labels generated by a fine-tuned GPT-4o. While the GPT-4o teacher was trained on human data, this model is essentially learning to approximate the teacher model's application of the codebook. Errors or biases present in the teacher model's predictions will be propagated to this BERT model.
  • Domain Specificity: The model is highly specific to the Turkish political context (2010s-2020s) and the specific rhetorical style used against the CHP.

Acknowledgments

I am deeply grateful to the University of Chicago Forum for Free Inquiry and Expression for their generous funding of this research.

Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Politics/turkey-chp-villain

Finetuned
(1)
this model

Datasets used to train Politics/turkey-chp-villain