VANPY: Voice Analysis Framework
Paper
•
2502.17579
•
Published
•
1
This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker height from audio input. The model was trained on the VoxCeleb2 and evaluated on the VoxCeleb2 and TIMIT datasets.
The model was trained on height enriched VoxCeleb2 dataset (for details read the paper):
You can install the package directly from GitHub:
pip install git+https://github.com/griko/voice-height-regression.git
from voice_height_regressor import HeightRegressionPipeline
# Load the pipeline
regressor = HeightRegressionPipeline.from_pretrained(
"griko/height_reg_svr_ecapa_voxceleb"
)
# Single file prediction
result = regressor("path/to/audio.wav")
print(f"Predicted height: {result[0]:.1f} cm")
# Batch prediction
results = regressor(["audio1.wav", "audio2.wav"])
print(f"Predicted heights: {[f'{h:.1f}' for h in results]} cm")
If you use this model in your research, please cite:
@misc{koushnir2025vanpyvoiceanalysisframework,
title={VANPY: Voice Analysis Framework},
author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
year={2025},
eprint={2502.17579},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2502.17579},
}