-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2410.13085
-
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
Paper • 2410.13085 • Published • 23 -
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
Paper • 2408.02900 • Published • 30 -
MediAug: Exploring Visual Augmentation in Medical Imaging
Paper • 2504.18983 • Published • 7 -
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
Paper • 2410.18387 • Published
-
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
Paper • 2411.18672 • Published -
CoMT: Chain-of-Medical-Thought Reduces Hallucination in Medical Report Generation
Paper • 2406.11451 • Published -
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 113 -
μ^2Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation
Paper • 2507.00316 • Published • 15
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
Paper • 2410.13085 • Published • 23 -
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
Paper • 2408.02900 • Published • 30 -
MediAug: Exploring Visual Augmentation in Medical Imaging
Paper • 2504.18983 • Published • 7 -
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
Paper • 2410.18387 • Published
-
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
Paper • 2411.18672 • Published -
CoMT: Chain-of-Medical-Thought Reduces Hallucination in Medical Report Generation
Paper • 2406.11451 • Published -
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 113 -
μ^2Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation
Paper • 2507.00316 • Published • 15