view article Article Building Conversational AI: A Deep Dive into Voice Agent Architectures and Best Practices Sep 2, 2025 • 15
patrickvonplaten/wavlm-libri-clean-100h-large Automatic Speech Recognition • Updated Dec 17, 2021 • 741 • 4
Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems Paper • 2509.13989 • Published Sep 17, 2025 • 3
CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition Paper • 2506.06071 • Published Jun 6, 2025 • 1
EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition Paper • 2506.04652 • Published Jun 5, 2025 • 1
Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning Paper • 2505.16220 • Published May 22, 2025 • 1
Revisiting Modeling and Evaluation Approaches in Speech Emotion Recognition: Considering Subjectivity of Annotators and Ambiguity of Emotions Paper • 2510.05934 • Published Oct 7, 2025 • 2
CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition Paper • 2506.06071 • Published Jun 6, 2025 • 1
EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition Paper • 2506.04652 • Published Jun 5, 2025 • 1
Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems Paper • 2509.13989 • Published Sep 17, 2025 • 3
EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems Paper • 2508.17623 • Published Aug 25, 2025 • 1
Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning Paper • 2505.16220 • Published May 22, 2025 • 1
Revisiting Modeling and Evaluation Approaches in Speech Emotion Recognition: Considering Subjectivity of Annotators and Ambiguity of Emotions Paper • 2510.05934 • Published Oct 7, 2025 • 2 • 2
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models Paper • 2509.26388 • Published Sep 30, 2025 • 26
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment Paper • 2507.02768 • Published Jul 3, 2025 • 18