Improve model card: add metadata, links, and sample usage for DeEAR

This PR significantly enhances the model card for the `FreedomIntelligence/DeEAR_Base` model by adding crucial metadata and detailed information.

Specifically, it:
- Adds `library_name: transformers` metadata, as the `config.json` and `preprocessor_config.json` indicate compatibility and usage of `transformers` components (e.g., `model_type: wav2vec2`, `Wav2Vec2FeatureExtractor`). This enables the automated "how to use" widget on the model page.
- Adds `pipeline_tag: audio-classification` to categorize the model for discovery on the Hugging Face Hub, reflecting its task of evaluating speech expressiveness.
- Links to the official Hugging Face paper page: [Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment](https://huggingface.co/papers/2510.20513).
- Provides a link to the project page: [https://freedomintelligence.github.io/ExpressiveSpeech/](https://freedomintelligence.github.io/ExpressiveSpeech/).
- Includes a link to the GitHub repository: [https://github.com/FreedomIntelligence/ExpressiveSpeech](https://github.com/FreedomIntelligence/ExpressiveSpeech).
- Integrates a comprehensive "Introduction", "Key Features", and "Framework Overview" directly from the GitHub README to better describe the model.
- Incorporates a "Sample Usage" section with code snippets directly from the GitHub README's "Quick Start" guide, enabling users to quickly get started with inference.
- Adds the BibTeX citation for proper academic attribution.

These updates aim to make the model more discoverable, easier to understand, and more user-friendly on the Hugging Face Hub.

Files changed (1) hide show

README.md +78 -3

README.md CHANGED Viewed

@@ -1,3 +1,78 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: audio-classification
+---
+# DeEAR: Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment
+This repository contains the DeEAR model as presented in the paper [Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment](https://huggingface.co/papers/2510.20513).
+Project Page: [https://freedomintelligence.github.io/ExpressiveSpeech/](https://freedomintelligence.github.io/ExpressiveSpeech/)
+Code Repository: [https://github.com/FreedomIntelligence/ExpressiveSpeech](https://github.com/FreedomIntelligence/ExpressiveSpeech)
+Hugging Face Dataset: [FreedomIntelligence/ExpressiveSpeech](https://huggingface.co/datasets/FreedomIntelligence/ExpressiveSpeech)
+<div align="center">
+  <img src="https://github.com/FreedomIntelligence/ExpressiveSpeech/raw/main/assets/Architecture.png" alt="DeEAR Framework Diagram" width="45%"/>
+  <br>
+  <em>Figure 1: The DeEAR Framework. (A) The training pipeline involves four stages: decomposition, sub-dimension modeling, learning a fusion function, and distillation. (B) Applications include data filtering and serving as a reward model.</em>
+</p>
+## Introduction
+Recent speech-to-speech (S2S) models can generate intelligible speech but often lack natural expressiveness, largely due to the absence of a reliable evaluation metric. To address this, we present **DeEAR (Decoding the Expressive Preference of eAR)**, a novel framework that converts human preferences for speech expressiveness into an objective score.
+Grounded in phonetics and psychology, DeEAR evaluates speech across three core dimensions: **Emotion**, **Prosody**, and **Spontaneity**. It achieves strong alignment with human perception (Spearman's Rank Correlation Coefficient, SRCC = 0.86) using fewer than 500 annotated samples.
+Beyond reliable scoring, DeEAR enables fair benchmarking and targeted data curation. We applied DeEAR to build **ExpressiveSpeech**, a high-quality dataset, and used it to fine-tune an S2S model, which improved its overall expressiveness score from 2.0 to 23.4 (on a 100-point scale).
+## Key Features
+*   **Multi-dimensional Objective Scoring**: Decomposes speech expressiveness into quantifiable dimensions of Emotion, Prosody, and Spontaneity.
+*   **Strong Alignment with Human Perception**: Achieves a Spearman's Rank Correlation (SRCC) of **0.86** with human ratings for overall expressiveness.
+*   **Data-Efficient and Scalable**: Requires minimal annotated data, making it practical for deployment and scaling.
+*   **Dual Applications**:
+    1.  **Automated Model Benchmarking**: Ranks SOTA models with near-perfect correlation (SRCC = **0.96**) to human rankings.
+    2.  **Evaluation-Driven Data Curation**: Efficiently filters and curates high-quality, expressive speech datasets.
+*   **Release of ExpressiveSpeech Dataset**: A new large-scale, bilingual (English-Chinese) dataset containing ~14,000 utterances of highly expressive speech.
+## Quick Start (Inference)
+To get started with DeEAR, follow these steps to perform inference:
+1.  **Clone the Repository**
+    ```bash
+    git clone https://github.com/FreedomIntelligence/ExpressiveSpeech.git
+    cd ExpressiveSpeech
+    ```
+2.  **Setup Environment**
+    ```bash
+    conda create -n DeEAR python=3.10
+    conda activate DeEAR
+    pip install -r requirements.txt
+    conda install -c conda-forge ffmpeg
+    ```
+3.  **Prepare Model**
+    Download the DeEAR_Base model from [FreedomIntelligence/DeEAR_Base](https://huggingface.co/FreedomIntelligence/DeEAR_Base) and place it in the `./models/DeEAR_Base/` directory.
+4.  **Run Inference**
+    ```bash
+    python inference.py \
+        --model_dir ./models \
+        --input_path /path/to/audio_folder \
+        --output_file /path/to/save/my_scores.jsonl \
+        --batch_size 64
+    ```
+## Citation
+If you use our work in your research, please cite the following paper:
+```bibtex
+@article{lin2025decoding,
+  title={Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment},
+  author={Lin, Zhiyu and Yang, Jingwen and Zhao, Jiale and Liu, Meng and Li, Sunzhu and Wang, Benyou},
+  journal={arXiv preprint arXiv:2510.20513},
+  year={2025}
+}
+```