Sankar Mukherjee

sankar1535

https://sankar-mukherjee.github.io/

AI & ML interests

speech

Recent Activity

commented on an article 7 days ago

LLM based Audio models

commented on an article 7 days ago

LLM based Audio models

upvoted an article about 1 year ago

Efficient Deep Learning: A Comprehensive Overview of Optimization Techniques 👐 📚

View all activity

Organizations

commented on LLM based Audio models 7 days ago

Saying “speech tokens are just another language” is technically true and practically incomplete. You don’t know if the llm is using speech unless you break things on purpose.

For example,

If you mask the speech tokens while keeping the text prompt unchanged and the model suddenly fails. Its a strong signal the information was coming from audio.
if swapping the spoken content (but not the text) changes the output, it’s pretty clear the model is extracting meaning from speech itself.
If attention and gradient attribution consistently point to specific speech token regions tied to the answer, its another indicator.

commented on LLM based Audio models 7 days ago

How do you know the llm is actually using the speech token not the text token for information extraction? When llm is adapted with speech tokens, llms reasoning capabilities reduces. Why is that?

upvoted an article about 1 year ago

Article