nielsr HF Staff commited on
Commit
3e57db9
·
verified ·
1 Parent(s): 8a38103

Improve model card: add metadata, links, and sample usage for DeEAR

Browse files

This PR significantly enhances the model card for the `FreedomIntelligence/DeEAR_Base` model by adding crucial metadata and detailed information.

Specifically, it:
- Adds `library_name: transformers` metadata, as the `config.json` and `preprocessor_config.json` indicate compatibility and usage of `transformers` components (e.g., `model_type: wav2vec2`, `Wav2Vec2FeatureExtractor`). This enables the automated "how to use" widget on the model page.
- Adds `pipeline_tag: audio-classification` to categorize the model for discovery on the Hugging Face Hub, reflecting its task of evaluating speech expressiveness.
- Links to the official Hugging Face paper page: [Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment](https://huggingface.co/papers/2510.20513).
- Provides a link to the project page: [https://freedomintelligence.github.io/ExpressiveSpeech/](https://freedomintelligence.github.io/ExpressiveSpeech/).
- Includes a link to the GitHub repository: [https://github.com/FreedomIntelligence/ExpressiveSpeech](https://github.com/FreedomIntelligence/ExpressiveSpeech).
- Integrates a comprehensive "Introduction", "Key Features", and "Framework Overview" directly from the GitHub README to better describe the model.
- Incorporates a "Sample Usage" section with code snippets directly from the GitHub README's "Quick Start" guide, enabling users to quickly get started with inference.
- Adds the BibTeX citation for proper academic attribution.

These updates aim to make the model more discoverable, easier to understand, and more user-friendly on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +78 -3
README.md CHANGED
@@ -1,3 +1,78 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: audio-classification
5
+ ---
6
+
7
+ # DeEAR: Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment
8
+
9
+ This repository contains the DeEAR model as presented in the paper [Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment](https://huggingface.co/papers/2510.20513).
10
+
11
+ Project Page: [https://freedomintelligence.github.io/ExpressiveSpeech/](https://freedomintelligence.github.io/ExpressiveSpeech/)
12
+ Code Repository: [https://github.com/FreedomIntelligence/ExpressiveSpeech](https://github.com/FreedomIntelligence/ExpressiveSpeech)
13
+ Hugging Face Dataset: [FreedomIntelligence/ExpressiveSpeech](https://huggingface.co/datasets/FreedomIntelligence/ExpressiveSpeech)
14
+
15
+ <div align="center">
16
+ <img src="https://github.com/FreedomIntelligence/ExpressiveSpeech/raw/main/assets/Architecture.png" alt="DeEAR Framework Diagram" width="45%"/>
17
+ <br>
18
+ <em>Figure 1: The DeEAR Framework. (A) The training pipeline involves four stages: decomposition, sub-dimension modeling, learning a fusion function, and distillation. (B) Applications include data filtering and serving as a reward model.</em>
19
+ </p>
20
+
21
+ ## Introduction
22
+
23
+ Recent speech-to-speech (S2S) models can generate intelligible speech but often lack natural expressiveness, largely due to the absence of a reliable evaluation metric. To address this, we present **DeEAR (Decoding the Expressive Preference of eAR)**, a novel framework that converts human preferences for speech expressiveness into an objective score.
24
+
25
+ Grounded in phonetics and psychology, DeEAR evaluates speech across three core dimensions: **Emotion**, **Prosody**, and **Spontaneity**. It achieves strong alignment with human perception (Spearman's Rank Correlation Coefficient, SRCC = 0.86) using fewer than 500 annotated samples.
26
+
27
+ Beyond reliable scoring, DeEAR enables fair benchmarking and targeted data curation. We applied DeEAR to build **ExpressiveSpeech**, a high-quality dataset, and used it to fine-tune an S2S model, which improved its overall expressiveness score from 2.0 to 23.4 (on a 100-point scale).
28
+
29
+ ## Key Features
30
+
31
+ * **Multi-dimensional Objective Scoring**: Decomposes speech expressiveness into quantifiable dimensions of Emotion, Prosody, and Spontaneity.
32
+ * **Strong Alignment with Human Perception**: Achieves a Spearman's Rank Correlation (SRCC) of **0.86** with human ratings for overall expressiveness.
33
+ * **Data-Efficient and Scalable**: Requires minimal annotated data, making it practical for deployment and scaling.
34
+ * **Dual Applications**:
35
+ 1. **Automated Model Benchmarking**: Ranks SOTA models with near-perfect correlation (SRCC = **0.96**) to human rankings.
36
+ 2. **Evaluation-Driven Data Curation**: Efficiently filters and curates high-quality, expressive speech datasets.
37
+ * **Release of ExpressiveSpeech Dataset**: A new large-scale, bilingual (English-Chinese) dataset containing ~14,000 utterances of highly expressive speech.
38
+
39
+ ## Quick Start (Inference)
40
+
41
+ To get started with DeEAR, follow these steps to perform inference:
42
+
43
+ 1. **Clone the Repository**
44
+ ```bash
45
+ git clone https://github.com/FreedomIntelligence/ExpressiveSpeech.git
46
+ cd ExpressiveSpeech
47
+ ```
48
+
49
+ 2. **Setup Environment**
50
+ ```bash
51
+ conda create -n DeEAR python=3.10
52
+ conda activate DeEAR
53
+ pip install -r requirements.txt
54
+ conda install -c conda-forge ffmpeg
55
+ ```
56
+
57
+ 3. **Prepare Model**
58
+ Download the DeEAR_Base model from [FreedomIntelligence/DeEAR_Base](https://huggingface.co/FreedomIntelligence/DeEAR_Base) and place it in the `./models/DeEAR_Base/` directory.
59
+
60
+ 4. **Run Inference**
61
+ ```bash
62
+ python inference.py \
63
+ --model_dir ./models \
64
+ --input_path /path/to/audio_folder \
65
+ --output_file /path/to/save/my_scores.jsonl \
66
+ --batch_size 64
67
+ ```
68
+
69
+ ## Citation
70
+ If you use our work in your research, please cite the following paper:
71
+ ```bibtex
72
+ @article{lin2025decoding,
73
+ title={Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment},
74
+ author={Lin, Zhiyu and Yang, Jingwen and Zhao, Jiale and Liu, Meng and Li, Sunzhu and Wang, Benyou},
75
+ journal={arXiv preprint arXiv:2510.20513},
76
+ year={2025}
77
+ }
78
+ ```