ReViSE: Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning Paper • 2512.09924 • Published 17 days ago • 3
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11 • 71
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens Paper • 2503.01710 • Published Mar 3 • 6
ChatMusician: Understanding and Generating Music Intrinsically with LLM Paper • 2402.16153 • Published Feb 25, 2024 • 59
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training Paper • 2306.00107 • Published May 31, 2023 • 4
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model Paper • 2305.06908 • Published May 11, 2023 • 6