Speech Models 🎧 - a MElHuseyni Collection

MElHuseyni 's Collections

Emotion Detection

Arabic Models (LLM, VLM, Multimodel)

Image Segmentation Models 🍪

OCR Models 👀️📃

Object Detection Models 🍉

Visual Embedding Models 🖼️

VLM Leaderboards 📈

Speech Models 🎧

Speech Models 🎧

updated Aug 25, 2025

ICTNLP/Llama-3.1-8B-Omni

Updated Nov 14, 2024 • 387 • 418
AudioPaLM: A Large Language Model That Can Speak and Listen

Paper • 2306.12925 • Published Jun 22, 2023 • 56
OpenMOSS-Team/SpeechGPT-7B-cm

Text Generation • Updated Sep 15, 2023 • 129 • 8
parler-tts/parler_tts_mini_v0.1

Text-to-Speech • 0.6B • Updated Apr 30, 2024 • 3.48k • 358
parler-tts/parler-tts-mini-expresso

Text-to-Speech • Updated May 21, 2024 • 777 • 117
ylacombe/expresso

Viewer • Updated Apr 30, 2024 • 11.6k • 1.04k • 89
parler-tts/parler-tts-large-v1

Text-to-Speech • 2B • Updated Nov 22, 2024 • 11.2k • 273
parler-tts/parler-tts-mini-v1

Text-to-Speech • 0.9B • Updated Nov 25, 2024 • 19.5k • 153
parler-tts/parler-tts-mini-jenny-30H

Text-to-Speech • 0.6B • Updated Apr 30, 2024 • 121 • 8
google/flan-t5-base

Updated Jul 17, 2023 • 2.1M • 1.07k
parler-tts/dac_44khZ_8kbps

76.7M • Updated Apr 10, 2024 • 509 • 19
distil-whisper/distil-large-v3

Automatic Speech Recognition • 0.8B • Updated 28 days ago • 1.36M • 376
distil-whisper/distil-large-v3-ggml

Automatic Speech Recognition • Updated Mar 21, 2024 • 24
distil-whisper/distil-large-v3-ct2

Automatic Speech Recognition • Updated Mar 22, 2024 • 189 • 6
distil-whisper/distil-large-v3-openai

Automatic Speech Recognition • Updated Mar 27, 2024 • 4
distil-whisper/distil-large-v2

Automatic Speech Recognition • 0.8B • Updated 28 days ago • 9.2k • 516
distil-whisper/distil-medium.en

Automatic Speech Recognition • 0.4B • Updated 28 days ago • 8.18k • 127
distil-whisper/distil-small.en

Automatic Speech Recognition • 0.2B • Updated 28 days ago • 10.7k • 112
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 56
coqui/XTTS-v2

Text-to-Speech • Updated Dec 11, 2023 • 8.57M • 3.55k
suno/bark

Text-to-Speech • Updated Oct 4, 2023 • 18.5k • 1.52k
OuteAI/OuteTTS-0.1-350M

Text-to-Speech • Updated Apr 17, 2025 • 115 • 302
microsoft/speecht5_tts

Text-to-Speech • Updated Nov 8, 2023 • 121k • 830
fixie-ai/ultravox-v0_4_1-llama-3_1-8b

Audio-Text-to-Text • Updated May 6, 2025 • 143 • 99
fixie-ai/ultravox-v0_4_1-llama-3_1-70b

Audio-Text-to-Text • 58.7M • Updated May 6, 2025 • 13 • 24
fixie-ai/ultravox-v0_4_1-mistral-nemo

Audio-Text-to-Text • Updated May 6, 2025 • 357 • 26
facebook/seamless-m4t-v2-large

Automatic Speech Recognition • 2B • Updated Jan 4, 2024 • 81.6k • 982
nvidia/diar_sortformer_4spk-v1

Automatic Speech Recognition • 0.1B • Updated Dec 15, 2025 • 16.4k • 138
amiriparian/ExHuBERT

Audio Classification • Updated Dec 15, 2024 • 153 • 19
BUT-FIT/DiCoW_v3_2

Automatic Speech Recognition • 1.0B • Updated Sep 2, 2025 • 1.6k • 9
pyannote/segmentation-3.0

Voice Activity Detection • Updated May 10, 2024 • 9.99M • 993
SWivid/F5-TTS

Text-to-Speech • Updated Mar 21, 2025 • 528k • 1.17k
SWivid/E2-TTS

Text-to-Speech • Updated Mar 12, 2025 • 113k • 57
ResembleAI/chatterbox

Text-to-Speech • Updated 26 days ago • 2.18M • • 1.59k
NAMAA-Space/EgypTalk-ASR-v2

Updated Aug 9, 2025 • 1.1k • 12
nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0

Automatic Speech Recognition • Updated Oct 21, 2025 • 110k • 38
nvidia/canary-1b-v2

Automatic Speech Recognition • Updated Dec 3, 2025 • 109k • 384
nvidia/canary-1b-flash

Automatic Speech Recognition • 0.8B • Updated Dec 3, 2025 • 239k • 272
nvidia/parakeet-tdt-0.6b-v3

Automatic Speech Recognition • 0.6B • Updated Apr 16 • 316k • 850
Running on CPU Upgrade

Agents

Featured

1.34k

Open ASR Leaderboard

🏆

1.34k

Explore and compare speech recognition model benchmarks
microsoft/VibeVoice-1.5B

Text-to-Speech • 3B • Updated Jan 22 • 216k • 2.38k