This comment has been hidden
MeyerAliba
Meyer54
Β·
AI & ML interests
None yet
Recent Activity
liked
a model
8 days ago
YatharthS/MiraTTS
upvoted
an
article
8 days ago
LLM based Audio models
Organizations
None yet
upvoted
an
article
8 days ago
Article
LLM based Audio models
β’
42
reacted to
YatharthS's
post with ππ₯
8 days ago
Post
3393
π€― π€― Released a high quality finetuned LLM based TTS model that can generate realistic and clear 48khz audio at over 100x realtime speed! π€― π€―
Github link: https://github.com/ysharma3501/MiraTTS
Model link: https://github.com/ysharma3501/MiraTTS
Blog explaining llm tts models: https://huggingface.co/blog/YatharthS/llm-tts-models
Github link: https://github.com/ysharma3501/MiraTTS
Model link: https://github.com/ysharma3501/MiraTTS
Blog explaining llm tts models: https://huggingface.co/blog/YatharthS/llm-tts-models
reacted to
martinsu's
post with π
13 days ago
Post
3298
https://huggingface.co/blog/martinsu/potus-broke-my-pipeline
How POTUS Completely Broke My Flash 2.5-Based Guardrail
Did quite a bit of deep research on this one, since it IMHO matters. At first I used this story to amuse fellow MLOps guys, but then I went deeper and was surprised.
To those who don't want to read too much, in plain English: when you give the model a high-stakes statement that clashes with what it "knows" about the world, it gets more brittle. Sometimes to a point of being unusable.
Or an even shorter version: do not clash with the model's given worldviewβit will degrade to some extent.
And in practice, it means that in lower-resource languages like Latvian and Finnish (and probably others), Flash 2.5 is an unreliable guardrail model when something clashes with the model's general "worldview".
However, I'm sure this degradation applies to other languages and models as well to varying extents.
In one totally normal week of MLOps, my news summarization pipeline started failing intermittently. Nothing was changed. No deploys. No prompt edits. No model version bump (as far as I could tell). Yet the guardrail would suddenly turn into a grumpy judge and reject outputs for reasons that felt random, sometimes even contradicting itself between runs. It was the worst kind of failure: silent, flaky, and impossible to reproduce on demand.
Then I noticed the pattern: it started when one specific named entity appeared in the text β Donald Trump ** (**and later in tests β Bernie Sanders too ).
And then down the rabbit hole I went.
How POTUS Completely Broke My Flash 2.5-Based Guardrail
Did quite a bit of deep research on this one, since it IMHO matters. At first I used this story to amuse fellow MLOps guys, but then I went deeper and was surprised.
To those who don't want to read too much, in plain English: when you give the model a high-stakes statement that clashes with what it "knows" about the world, it gets more brittle. Sometimes to a point of being unusable.
Or an even shorter version: do not clash with the model's given worldviewβit will degrade to some extent.
And in practice, it means that in lower-resource languages like Latvian and Finnish (and probably others), Flash 2.5 is an unreliable guardrail model when something clashes with the model's general "worldview".
However, I'm sure this degradation applies to other languages and models as well to varying extents.
In one totally normal week of MLOps, my news summarization pipeline started failing intermittently. Nothing was changed. No deploys. No prompt edits. No model version bump (as far as I could tell). Yet the guardrail would suddenly turn into a grumpy judge and reject outputs for reasons that felt random, sometimes even contradicting itself between runs. It was the worst kind of failure: silent, flaky, and impossible to reproduce on demand.
Then I noticed the pattern: it started when one specific named entity appeared in the text β Donald Trump ** (**and later in tests β Bernie Sanders too ).
And then down the rabbit hole I went.
reacted to
YatharthS's
post with ππ₯
13 days ago
Post
2817
I just released LayaCodec, a highly efficient neural audio tokenizer/codec for TTS models, far better than most previous audio tokenizers.
π€― Next-gen TTS models that use this could achieve several 100s of times real-time speed while producing clearer audio!! π€―
GitHub repo: https://github.com/ysharma3501/LayaCodec
Model: YatharthS/LayaCodec
π€― Next-gen TTS models that use this could achieve several 100s of times real-time speed while producing clearer audio!! π€―
GitHub repo: https://github.com/ysharma3501/LayaCodec
Model: YatharthS/LayaCodec
upvoted
an
article
about 1 month ago
Article
How to make NeuTTS-air generate over 200 seconds of audio in a single second.
β’
21
reacted to
samerzaher80's
post with π
about 1 month ago
Post
1851
Need Help Getting arXiv Endorsement for My AI Research Paper
Hi everyone,
I hope you're doing well. Iβm trying to publish my new AI research paper on arXiv under the cs.AI category, but I currently need an endorser who is already authorized for cs.AI submissions.
If anyone here is registered as a cs.AI endorser and is willing to help, I would truly appreciate it.
Here is the official arXiv endorsement request link:
π https://arxiv.org/auth/endorse?x=EZEMO7
(Backup: http://arxiv.org/auth/endorse.php β Code: EZEMO7)
My research:
Itβs part of the AetherMind project β a self-reflective NLI reasoning system inspired by human cognitive consistency and used also in Alzheimerβs research. If needed, I can share the abstract or full PDF.
Thank you so much to anyone who can support.
β Sameer S.Najm
Hi everyone,
I hope you're doing well. Iβm trying to publish my new AI research paper on arXiv under the cs.AI category, but I currently need an endorser who is already authorized for cs.AI submissions.
If anyone here is registered as a cs.AI endorser and is willing to help, I would truly appreciate it.
Here is the official arXiv endorsement request link:
π https://arxiv.org/auth/endorse?x=EZEMO7
(Backup: http://arxiv.org/auth/endorse.php β Code: EZEMO7)
My research:
Itβs part of the AetherMind project β a self-reflective NLI reasoning system inspired by human cognitive consistency and used also in Alzheimerβs research. If needed, I can share the abstract or full PDF.
Thank you so much to anyone who can support.
β Sameer S.Najm
reacted to
YatharthS's
post with π₯
about 1 month ago
Post
1680
Just uploaded a detailed blog about my findings in optimizing NeuTTS to generate 200 seconds of audio in a single second. Also went in depth in NeuTTSβs architecture. Will be happy to answer any questions.
https://huggingface.co/blog/YatharthS/making-neutts-200x-realtime
https://huggingface.co/blog/YatharthS/making-neutts-200x-realtime