ethananhtran
's Collections
Read But Not Implemented
updated
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper
•
2512.16093
•
Published
•
95
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper
•
2511.22699
•
Published
•
234
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper
•
2512.16676
•
Published
•
214
Sharp Monocular View Synthesis in Less Than a Second
Paper
•
2512.10685
•
Published
•
28
Latent Implicit Visual Reasoning
Paper
•
2512.21218
•
Published
•
69
SemanticGen: Video Generation in Semantic Space
Paper
•
2512.20619
•
Published
•
93
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Paper
•
2512.04677
•
Published
•
170
Spatia: Video Generation with Updatable Spatial Memory
Paper
•
2512.15716
•
Published
•
33
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
Paper
•
2512.19693
•
Published
•
65
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper
•
2511.14993
•
Published
•
230
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Paper
•
2512.11253
•
Published
•
36
Diffusion Transformers with Representation Autoencoders
Paper
•
2510.11690
•
Published
•
166
Paper
•
2412.18653
•
Published
•
86
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion
Paper
•
2512.17504
•
Published
•
97
ProEdit: Inversion-based Editing From Prompts Done Right
Paper
•
2512.22118
•
Published
•
18
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
Paper
•
2511.22677
•
Published
•
31
FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction
Paper
•
2512.16900
•
Published
•
11
StoryMem: Multi-shot Long Video Storytelling with Memory
Paper
•
2512.19539
•
Published
•
18
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
Paper
•
2512.23576
•
Published
•
65
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper
•
2512.24618
•
Published
•
146
Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion
Paper
•
2512.23709
•
Published
•
49
mHC: Manifold-Constrained Hyper-Connections
Paper
•
2512.24880
•
Published
•
290
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Paper
•
2512.23959
•
Published
•
110
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
Paper
•
2601.00664
•
Published
•
56
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
Paper
•
2512.20578
•
Published
•
85
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields
Paper
•
2601.03252
•
Published
•
100
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
Paper
•
2601.02151
•
Published
•
104
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper
•
2601.05242
•
Published
•
210
Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers
Paper
•
2601.04890
•
Published
•
41
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper
•
2601.03233
•
Published
•
141
MMFormalizer: Multimodal Autoformalization in the Wild
Paper
•
2601.03017
•
Published
•
105
Controlled Self-Evolution for Algorithmic Code Optimization
Paper
•
2601.07348
•
Published
•
113
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper
•
2601.08763
•
Published
•
143
VIBE: Visual Instruction Based Editor
Paper
•
2601.02242
•
Published
•
63
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge
Paper
•
2601.08808
•
Published
•
38
Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey
Paper
•
2601.11655
•
Published
•
60
LongCat-Flash-Thinking-2601 Technical Report
Paper
•
2601.16725
•
Published
•
163