Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages Paper • 2303.01037 • Published Mar 2, 2023
SLM: Bridge the thin gap between speech and text foundation models Paper • 2310.00230 • Published Sep 30, 2023 • 1
E3 TTS: Easy End-to-End Diffusion-based Text to Speech Paper • 2311.00945 • Published Nov 2, 2023 • 16
Gemini: A Family of Highly Capable Multimodal Models Paper • 2312.11805 • Published Dec 19, 2023 • 47
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8, 2024 • 66
Noise2Music: Text-conditioned Music Generation with Diffusion Models Paper • 2302.03917 • Published Feb 8, 2023
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Paper • 2106.09660 • Published Jun 17, 2021