Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.14993

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 189
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 43

papers-most-view-by-month

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 294
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 546
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 506

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230

Foundation Models

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230

cabinet-data_curation

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Paper • 2507.01352 • Published Jul 2, 2025 • 56
A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models

Paper • 2507.13563 • Published Jul 17, 2025 • 53
Scaling Laws for Optimal Data Mixtures

Paper • 2507.09404 • Published Jul 12, 2025 • 37
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230

Read But Not Implemented

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Paper • 2512.16093 • Published Dec 18, 2025 • 95
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 234
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 214
Sharp Monocular View Synthesis in Less Than a Second

Paper • 2512.10685 • Published Dec 11, 2025 • 28

Large-scale T2I, T2V

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Paper • 2511.15065 • Published Nov 19, 2025 • 77
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 128
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36

SPATIALGEN: Layout-guided 3D Indoor Scene Generation

Paper • 2509.14981 • Published Sep 18, 2025 • 27
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230
Kling-Omni Technical Report

Paper • 2512.16776 • Published Dec 18, 2025 • 169
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published Dec 4, 2025 • 170

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 189
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 43

cabinet-data_curation

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Paper • 2507.01352 • Published Jul 2, 2025 • 56
A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models

Paper • 2507.13563 • Published Jul 17, 2025 • 53
Scaling Laws for Optimal Data Mixtures

Paper • 2507.09404 • Published Jul 12, 2025 • 37
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230

papers-most-view-by-month

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 294
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 546
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 506

Read But Not Implemented

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Paper • 2512.16093 • Published Dec 18, 2025 • 95
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 234
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 214
Sharp Monocular View Synthesis in Less Than a Second

Paper • 2512.10685 • Published Dec 11, 2025 • 28

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230

Large-scale T2I, T2V

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Paper • 2511.15065 • Published Nov 19, 2025 • 77
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 128
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published Nov 26, 2025 • 36

Foundation Models

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230

SPATIALGEN: Layout-guided 3D Indoor Scene Generation

Paper • 2509.14981 • Published Sep 18, 2025 • 27
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230
Kling-Omni Technical Report

Paper • 2512.16776 • Published Dec 18, 2025 • 169
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published Dec 4, 2025 • 170

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs