Collections of multimodal (image+text) instruction finetuning datasets tailored for visual language models like LlaVA, Fuyu, or IDEFICS.
Victor Sanh PRO
VictorSanh
AI & ML interests
None yet
Recent Activity
upvoted a collection 7 days ago
🧬 Carbon upvoted an article about 1 month ago
ML Intern Takes Our Post-Training Internship Test liked a Space about 2 months ago
HuggingFaceFW/finephrase