Papers
arxiv:2601.09278

M^3Searcher: Modular Multimodal Information Seeking Agency with Retrieval-Oriented Reasoning

Published on Jan 14
Authors:
,
,
,

Abstract

M$^3$Searcher is a multimodal information-seeking agent that decouples information acquisition from answer derivation using retrieval-oriented multi-objective rewards and demonstrates superior performance in complex multimodal tasks.

AI-generated summary

Recent advances in DeepResearch-style agents have demonstrated strong capabilities in autonomous information acquisition and synthesize from real-world web environments. However, existing approaches remain fundamentally limited to text modality. Extending autonomous information-seeking agents to multimodal settings introduces critical challenges: the specialization-generalization trade-off that emerges when training models for multimodal tool-use at scale, and the severe scarcity of training data capturing complex, multi-step multimodal search trajectories. To address these challenges, we propose M^3Searcher, a modular multimodal information-seeking agent that explicitly decouples information acquisition from answer derivation. M^3Searcher is optimized with a retrieval-oriented multi-objective reward that jointly encourages factual accuracy, reasoning soundness, and retrieval fidelity. In addition, we develop MMSearchVQA, a multimodal multi-hop dataset to support retrieval centric RL training. Experimental results demonstrate that M^3Searcher outperforms existing approaches, exhibits strong transfer adaptability and effective reasoning in complex multimodal tasks.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2601.09278
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.09278 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.09278 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.09278 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.