Height Estimation Model

This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker height from audio input. The model was trained on the VoxCeleb2 and evaluated on the VoxCeleb2 and TIMIT datasets.

Model Details

Input: Audio file (will be converted to 16kHz, mono, single channel)
Output: Predicted height in centimeters (continuous value)
Speaker embedding: 192-dimensional ECAPA-TDNN embedding from SpeechBrain
Regressor: Support Vector Regression optimized through Optuna
Performance:
- VoxCeleb2 test set: 6.01 cm Mean Absolute Error (MAE)
- TIMIT test set: 6.02 cm Mean Absolute Error (MAE)

Training Data

The model was trained on height enriched VoxCeleb2 dataset (for details read the paper):

Audio preprocessing:
- Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate
- Applied SileroVAD for voice activity detection, taking the first voiced segment

Installation

You can install the package directly from GitHub:

pip install git+https://github.com/griko/voice-height-regression.git

Usage

from voice_height_regressor import HeightRegressionPipeline

# Load the pipeline
regressor = HeightRegressionPipeline.from_pretrained(
    "griko/height_reg_svr_ecapa_voxceleb"
)

# Single file prediction
result = regressor("path/to/audio.wav")
print(f"Predicted height: {result[0]:.1f} cm")

# Batch prediction
results = regressor(["audio1.wav", "audio2.wav"])
print(f"Predicted heights: {[f'{h:.1f}' for h in results]} cm")

Limitations

Model was trained on celebrity voices from YouTube interviews
Performance may vary on different audio qualities or recording conditions
Height predictions are estimates and should not be used for medical or legal purposes

Citation

If you use this model in your research, please cite:

@misc{koushnir2025vanpyvoiceanalysisframework,
      title={VANPY: Voice Analysis Framework}, 
      author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
      year={2025},
      eprint={2502.17579},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2502.17579}, 
}

Downloads last month: 4

Paper for griko/height_reg_svr_ecapa_voxceleb

VANPY: Voice Analysis Framework

Paper • 2502.17579 • Published Feb 17, 2025 • 1