Decoding strategy of the Phi4 Multimodal

#50

by Zhengyang - opened Mar 21, 2025

Mar 21, 2025

Dear authors,
thank you for the great work. What is the decoding strategy of the phi4 multimodal? Is it beam search or topk sampling? I didn't find it in the configuration file.

Best,
Zhengyang

fanruchao

Mar 23, 2025

Hi @Zhengyang ,

For speech/audio tasks, we simply used greedy search (top-1) for the benchmark. You can try other options for more diverse output if you like.

Thanks,
Ruchao

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment