Post
3813
🚀 Introducing Robust-U1: Teaching MLLMs to Self-Recover Corrupted Visual Content
Multimodal Large Language Models (MLLMs) have achieved impressive visual understanding, yet they remain highly brittle under real-world corruptions—noise, blur, compression artifacts, adverse weather.
Standard MLLMs suffer dramatic performance drops, and existing robustness solutions come with fundamental limits: black‑box feature alignment lacks interpretability, while white‑box text reasoning cannot restore the lost pixel‑level visual details. This raises a crucial question:
🧐 Can MLLMs recover corrupted visual content by themselves?
If the answer is yes, we can move beyond merely “compensating” for corruption and instead build a more intrinsic, generalizable form of resilience. Robust-U1 is our answer to that question.
💡 Paper: https://arxiv.org/abs/2606.08063
🔗 Code: github.com/jqtangust/Robust-U1
🌍 Demo: Jiaqi-hkust/Robust-U1
Multimodal Large Language Models (MLLMs) have achieved impressive visual understanding, yet they remain highly brittle under real-world corruptions—noise, blur, compression artifacts, adverse weather.
Standard MLLMs suffer dramatic performance drops, and existing robustness solutions come with fundamental limits: black‑box feature alignment lacks interpretability, while white‑box text reasoning cannot restore the lost pixel‑level visual details. This raises a crucial question:
🧐 Can MLLMs recover corrupted visual content by themselves?
If the answer is yes, we can move beyond merely “compensating” for corruption and instead build a more intrinsic, generalizable form of resilience. Robust-U1 is our answer to that question.
💡 Paper: https://arxiv.org/abs/2606.08063
🔗 Code: github.com/jqtangust/Robust-U1
🌍 Demo: Jiaqi-hkust/Robust-U1