-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 27 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
Collections
Discover the best community collections!
Collections including paper arxiv:2309.11419
-
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 55 -
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Paper • 2311.05698 • Published • 14 -
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 95 -
PolyMaX: General Dense Prediction with Mask Transformer
Paper • 2311.05770 • Published • 11
-
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Paper • 2309.10020 • Published • 41 -
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 55 -
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Paper • 2309.16058 • Published • 56 -
Jointly Training Large Autoregressive Multimodal Models
Paper • 2309.15564 • Published • 8
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 27 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
-
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 55 -
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Paper • 2311.05698 • Published • 14 -
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 95 -
PolyMaX: General Dense Prediction with Mask Transformer
Paper • 2311.05770 • Published • 11
-
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Paper • 2309.10020 • Published • 41 -
Kosmos-2.5: A Multimodal Literate Model
Paper • 2309.11419 • Published • 55 -
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Paper • 2309.16058 • Published • 56 -
Jointly Training Large Autoregressive Multimodal Models
Paper • 2309.15564 • Published • 8