LI
RogerZhuo
AI & ML interests
None yet
Recent Activity
liked
a model
6 days ago
microsoft/TRELLIS.2-4B
liked
a model
9 days ago
FunAudioLLM/Fun-ASR-Nano-2512
upvoted
a
paper
16 days ago
Deep Research: A Systematic Survey
Organizations
Reading
Music
-
ElectricAlexis/NotaGen
Updated • 149 -
ASLP-lab/LLaSE-G1
Audio-to-Audio • Updated • 25 -
Running on ZeroFeatured663
Di♪♪Rhythm
🎶663Blazingly Fast and Embarrassingly Simple Song Generation
-
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
Paper • 2503.01183 • Published • 29
AI Arena
I2V
image-to-video
-
Wan-AI/Wan2.1-T2V-1.3B
Text-to-Video • Updated • 10.7k • • 406 -
VBench: Comprehensive Benchmark Suite for Video Generative Models
Paper • 2311.17982 • Published • 9 -
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Paper • 2411.13503 • Published • 34 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 298 • • 345
LLM
基础大模型相关
must-read-papers
AI Papers
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 11 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 33 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 168
OCR
images
images
-
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 798k • • 12.1k -
cagliostrolab/animagine-xl-4.0
Text-to-Image • Updated • 210k • 366 -
Running on L4Featured282
Thera Arbitrary-Scale Super-Resolution
🔥282Enhance images by increasing their resolution
-
stepfun-ai/Step1X-Edit
Image-to-Image • Updated • 121 • 326
TTS
语音相关
-
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Paper • 2307.16430 • Published • 4 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • Updated • 35.2k • 422 -
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper • 2502.05512 • Published • 6 -
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Paper • 2502.11946 • Published • 3
virtual try-on
虚拟换妆
-
Learning Flow Fields in Attention for Controllable Person Image Generation
Paper • 2412.08486 • Published • 36 -
franciszzj/Leffa
Image-to-Image • Updated • 340 -
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
Paper • 2411.18350 • Published • 29 -
Running on Zero61
TryOffDiff
🔥61Extract garment images from everyday images!
Data
must-read-papers
Reading
AI Papers
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 11 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 33 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 168
Music
-
ElectricAlexis/NotaGen
Updated • 149 -
ASLP-lab/LLaSE-G1
Audio-to-Audio • Updated • 25 -
Running on ZeroFeatured663
Di♪♪Rhythm
🎶663Blazingly Fast and Embarrassingly Simple Song Generation
-
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
Paper • 2503.01183 • Published • 29
OCR
AI Arena
images
images
-
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 798k • • 12.1k -
cagliostrolab/animagine-xl-4.0
Text-to-Image • Updated • 210k • 366 -
Running on L4Featured282
Thera Arbitrary-Scale Super-Resolution
🔥282Enhance images by increasing their resolution
-
stepfun-ai/Step1X-Edit
Image-to-Image • Updated • 121 • 326
I2V
image-to-video
-
Wan-AI/Wan2.1-T2V-1.3B
Text-to-Video • Updated • 10.7k • • 406 -
VBench: Comprehensive Benchmark Suite for Video Generative Models
Paper • 2311.17982 • Published • 9 -
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Paper • 2411.13503 • Published • 34 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 298 • • 345
TTS
语音相关
-
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Paper • 2307.16430 • Published • 4 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • Updated • 35.2k • 422 -
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper • 2502.05512 • Published • 6 -
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Paper • 2502.11946 • Published • 3
LLM
基础大模型相关
virtual try-on
虚拟换妆
-
Learning Flow Fields in Attention for Controllable Person Image Generation
Paper • 2412.08486 • Published • 36 -
franciszzj/Leffa
Image-to-Image • Updated • 340 -
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
Paper • 2411.18350 • Published • 29 -
Running on Zero61
TryOffDiff
🔥61Extract garment images from everyday images!