nvidia/canary-qwen-2.5b · Chunked inference?

Chunked inference?

by ponceoscarj - opened Jul 23, 2025

Jul 23, 2025

Is there a way to do chunked inference with canary-qwen-2.5b?
This link works for canary but could not implement it for canary-qwen-2.5b - https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_chunked_inference/aed/speech_to_text_aed_chunked_infer.py

ponceoscarj changed discussion title from Chunked inference to Chunked inference? Jul 23, 2025

piotrzelasko

NVIDIA org Jul 25, 2025

Yes, there is, please refer to this code in the spaces demo:
https://huggingface.co/spaces/nvidia/canary-qwen-2.5b/blob/main/app.py#L30-L66

No ETA yet but this feature will also eventually land in NeMo speechlm2 collection.

luian90

Aug 23, 2025

Tell me exactly what you are trying to do, and I can help you. I have a custom application That processes this exact method in a high-performance method. ! I'd be happy to advise/show you how to do it

KevinN15

Sep 9, 2025

•

edited Sep 9, 2025

Yes, there is, please refer to this code in the spaces demo:
https://huggingface.co/spaces/nvidia/canary-qwen-2.5b/blob/main/app.py#L30-L66

No ETA yet but this feature will also eventually land in NeMo speechlm2 collection.

is it possible to specify the source and target lang here too via parameters, e.g. transcribing to German? Or is the only option via the prompt?
PS: I provide German Audio samples and I currently get a gibberish of German and English back when using the prompt.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment