NX Chen's picture

7 3

NX Chen

cnxhk

AI & ML interests

LLM, Audio

Organizations

None yet

authored 9 papers 6 months ago

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

Paper • 2303.01037 • Published Mar 2, 2023

ESPnet: End-to-End Speech Processing Toolkit

Paper • 1804.00015 • Published Mar 30, 2018

SLM: Bridge the thin gap between speech and text foundation models

Paper • 2310.00230 • Published Sep 30, 2023 • 1

E3 TTS: Easy End-to-End Diffusion-based Text to Speech

Paper • 2311.00945 • Published Nov 2, 2023 • 16

WaveGrad: Estimating Gradients for Waveform Generation

Paper • 2009.00713 • Published Sep 2, 2020

Gemini: A Family of Highly Capable Multimodal Models

Paper • 2312.11805 • Published Dec 19, 2023 • 47

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8, 2024 • 66

Noise2Music: Text-conditioned Music Generation with Diffusion Models

Paper • 2302.03917 • Published Feb 8, 2023

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Paper • 2106.09660 • Published Jun 17, 2021