|
|
--- |
|
|
license: cc-by-sa-4.0 |
|
|
datasets: |
|
|
- GiliGold/VAD_KnessetCorpus |
|
|
- HaifaCLGroup/KnessetCorpus |
|
|
- GiliGold/Hebrew_VAD_lexicon |
|
|
language: |
|
|
- he |
|
|
tags: |
|
|
- vad |
|
|
- valence |
|
|
- arousal |
|
|
- dominance |
|
|
- regression |
|
|
- knesset |
|
|
--- |
|
|
# VAD Binomial Regression Models |
|
|
This repository contains three binomial regression models designed to predict VAD (Valence, Arousal, Dominance) scores for text inputs. |
|
|
Each model is stored as a separate pickle (.pkl) file: |
|
|
|
|
|
- **valence_model.pkl**: Predicts the Valence score (positivity/negativity). |
|
|
- **arousal_model.pkl**: Predicts the Arousal score (level of excitement or calm). |
|
|
- **dominance_model.pkl**: Predicts the Dominance score (sense of control or influence). |
|
|
|
|
|
All scores are normalized on a scale from 0 to 1. |
|
|
|
|
|
Before making predictions, input text must be converted into embeddings using the [Knesset-multi-e5-large](https://huggingface.co/GiliGold/Knesset-multi-e5-large) model. The embeddings are then fed into the regression models. |
|
|
|
|
|
## Training Data |
|
|
The models were trained using a combination of datasets to ensure robust and generalizable predictions: |
|
|
|
|
|
- A Hebrew version of the [Emobank Dataset](https://aclanthology.org/E17-2092/) (by buechel-hahn-2017-emobank): A comprehensive dataset containing emotional text data that we automaticaly translated to Hebrew using [Google/madlad400-3b-mt](https://huggingface.co/google/madlad400-3b-mt). |
|
|
- [Hebrew VAD Lexicon](https://huggingface.co/datasets/GiliGold/Hebrew_VAD_lexicon): A lexicon that provides VAD scores for Hebrew words. |
|
|
- [Knesset Sentences](https://huggingface.co/datasets/GiliGold/VAD_KnessetCorpus): A manually annotated set of 120 Knesset sentences with VAD scores, serving as an additional benchmark and source of training data. |
|
|
This diverse training data allowed the models to capture nuanced emotional features across different text domains, especially in Hebrew. |
|
|
|
|
|
## Model Details |
|
|
- Model Type: Binomial Regression |
|
|
- Input: Preprocessed text data (the specific feature extraction process should align with the training procedure). |
|
|
- Output: VAD scores (valence, arousal, and dominance) on a continuous scale from 0 to 1. |
|
|
Each model is provided as a .pkl file and can be loaded using Python's pickle module. |
|
|
|
|
|
## Usage Example |
|
|
```python |
|
|
from sentence_transformers import SentenceTransformer |
|
|
import pickle |
|
|
|
|
|
sentence = "ืื ืืฉืคื ืืืืืื" |
|
|
# Convert input text into embeddings using Knesset-multi-e5-large |
|
|
model = SentenceTransformer('GiliGold/Knesset-multi-e5-large') |
|
|
embedding_vector = model.encode(sentence) |
|
|
|
|
|
# Load the valence model |
|
|
#Option 1: Manually download files from https://huggingface.co/GiliGold/VAD_binomial_regression_models/tree/main) |
|
|
with open("valence_model.pkl", "rb") as file: |
|
|
valence_model = pickle.load(file) |
|
|
|
|
|
#Option 2: Download using Hugging Face hub |
|
|
from huggingface_hub import hf_hub_download |
|
|
repo_id = "GiliGold/VAD_binomial_regression_models" |
|
|
model_v_path = hf_hub_download(repo_id=repo_id, filename="valence_model.pkl") |
|
|
with open(model_v_path, "rb") as f: |
|
|
valence_model = pickle.load(f) |
|
|
|
|
|
# Assume `embedding_vector` is the vector obtained from the Knesset-multi model |
|
|
valence_score = valence_model.predict([embedding_vector]) |
|
|
|
|
|
print(f"Predicted Valence Score: {valence_score[0]}") |
|
|
``` |
|
|
|
|
|
Cite: |
|
|
@misc{goldin2025unveilingaffectivepolarizationtrends, |
|
|
title={Unveiling Affective Polarization Trends in Parliamentary Proceedings}, |
|
|
author={Gili Goldin and Ella Rabinovich and Shuly Wintner}, |
|
|
year={2025}, |
|
|
eprint={2512.05231}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2512.05231}, |
|
|
} |