🚀 Sonic: A lightweight Python audio processing library with tempo matching, BPM detection, time-stretching, resampling & track blending — now with GPU (CUDA) acceleration for 10x speed!
Perfect for quick remixes, batch edits or syncing tracks.
This model has been trained and validated on external datasets to support medical research workflows. It is designed to provide reproducible benchmarks and serve as a foundation for further exploration in healthcare AI.
Key highlights: - Built for medical research and diagnostic study contexts - Validated against external datasets for reliability - Openly available to empower the community in building stronger, more effective solutions
This release is part of my ongoing effort to make impactful AI research accessible through **Modotte**. A detailed blog post explaining the methodology, dataset handling, and validation process will be published soon.
Are Large Language Models actually becoming more intelligent, or just better at seeming intelligent?
There is a noticeable shift happening in the LLM space.
Models today can:
Generate cleaner and more structured code. Explain complex topics in simpler ways. Maintain longer and more coherent conversations.
Yet at the same time, they still:
Produce confident hallucinations. Fail in multi-step reasoning tasks. Break under slightly unfamiliar or challenging inputs.
This raises a critical question.
Are we advancing intelligence, or optimizing presentation?
Most improvements so far seem driven by:
Larger datasets. Increased scale. Alignment techniques like RLHF.
But these do not necessarily lead to genuine reasoning ability.
What still appears fundamentally missing:
Persistent memory across interactions. True reasoning rather than pattern completion. Grounded understanding connected to real-world context.
Reliable self-correction and verification mechanisms.
If current scaling trends start to plateau, the next breakthrough will not come from doing more of the same.
So the real question for the community is:
If you were designing the next generation of AI systems, where would you focus?
A. Larger models and compute B. Higher-quality and structured data C. Agent-based systems with tool use and memory D. New architectures beyond transformers
This is not just a technical discussion. It defines where AI is actually heading over the next few years.
I am interested to hear how others are thinking about this.
Just did something I’ve been meaning to try for ages.
In only 3 hours, on 10 billion+ tokens, I trained a custom BPE + tiktoken-style tokenizer using my new library microtok — and it hits the same token efficiency as Qwen3.
Tokenizers have always felt like black magic to me. We drop them into every LLM project, but actually training one from scratch? That always seemed way too complicated.
Turns out it doesn’t have to be.
microtok makes the whole process stupidly simple — literally just 3 lines of code. No heavy setup, no GPU required. I built it on top of the Hugging Face tokenizers library so it stays clean, fast, and actually understandable.
If you’ve ever wanted to look under the hood and build your own optimized vocabulary instead of just copying someone else’s, this is the entry point you’ve been waiting for.
I wrote up the full story, threw in a ready-to-run Colab template, and dropped the trained tokenizer on Hugging Face.
Introducing Seekify — a truly non‑rate‑limiting search library for Python
Tired of hitting rate limits when building search features? I’ve built Seekify, a lightweight Python library that lets you perform searches without the usual throttling headaches.
🔹 Key highlights
- Simple API — plug it in and start searching instantly
- No rate‑limiting restrictions
- Designed for developers who need reliable search in projects, scripts, or apps
📦 Available now on PyPI:
pip install seekify
👉 Check out the repo: https:/github.com/Parveshiiii/Seekify I’d love feedback, contributions, and ideas for real‑world use cases. Let’s make search smoother together!
🚀 Wanna train your own AI Model or Tokenizer from scratch?
Building models isn’t just for big labs anymore — with the right data, compute, and workflow, you can create **custom AI models** and **tokenizers** tailored to any domain. Whether it’s NLP, domain‑specific datasets, or experimental architectures, training from scratch gives you full control over vocabulary, embeddings, and performance.
✨ Why train your own? - Full control over vocabulary & tokenization - Domain‑specific optimization (medical, legal, technical, etc.) - Better performance on niche datasets - Freedom to experiment with architectures
⚡ The best part? - Tokenizer training (TikToken / BPE) can be done in **just 3 lines of code**. - Model training runs smoothly on **Google Colab notebooks** — no expensive hardware required.
🏙️ Hugging Face Community Post Title: 🧬 Experimenting with "Dynamic Chaos" in Tamil SLMs
Hi everyone! I just published a new experimental study on Small Language Model (SLM) resilience.
I took the Qwen2.5-0.5B model and put it through a "Chaos Phase" to see how much weight data a tiny model can lose before its understanding of classical Tamil grammar breaks.
Key highlights of the study:
Target Data: Fine-tuned on the Thirukkural (1,330 couplets + modern explanations). The Chaos Step: Applied 20% random weight pruning but implemented "Layer Protection" for the Token Embeddings and LM Head to keep the characters readable. Compression: 4-bit (Q4_K_M) quantization for extreme efficiency. Result: A surrealist classical Tamil model that is ultra-light (~300MB) and ultra-fast!
📢 The Announcement Subject: XenArcAI is now Modotte – A New Chapter Begins! 🚀
Hello everyone,
We are thrilled to announce that XenArcAI is officially rebranding to Modotte!
Since our journey began, we’ve been committed to pushing the boundaries of AI through open-source innovation, research, and high-quality datasets. As we continue to evolve, we wanted a name that better represents our vision for a modern, interconnected future in the tech space.
What is changing?
The Name: Moving forward, all our projects, models, and community interactions will happen under the Modotte banner.
The Look: You’ll see our new logo and a fresh color palette appearing across our platforms.
What is staying the same?
The Core Team: It’s still the same people behind the scenes, including our founder, Parvesh Rawal.
Our Mission: We remain dedicated to releasing state-of-the-art open-source models and datasets.
Our Continuity: All existing models, datasets, and projects will remain exactly as they are—just with a new home.
This isn’t just a change in appearance; it’s a commitment to our next chapter of growth and discovery. We are so grateful for your ongoing support as we step into this new era.
I’m excited to release hawky-ai-Qwen3-0.6B-Marketing-MoT, a specialized SLM designed for deep strategic reasoning in performance marketing.
While small at 0.6B parameters, this model punches way above its weight class by utilizing a Mixture of Thoughts (MoT) framework. It doesn't just give you an answer; it thinks through the logic of Meta Ads scaling, GA4 attribution, and unit economics before providing a strategic recommendation.
Key Features:
Thinking-First: Trained on 1,500+ critical thinking scenarios. MoT Framework: 5 distinct reasoning styles (Linear, Exploratory, Critical, Deconstructive, Analogical). SLM Speed: Perfect for low-latency, high-precision marketing audits. Check it out on Hugging Face: 🔗 Sri-Vigneshwar-DJ/hawky-ai-Qwen3-0.6B-Marketing-MoT
Introducing Hawky-AI H1 4B PM: The First Open-Source LLM for Performance Marketing 🎯
Hey HF Community! 👋
Just released the first LLM fine-tuned specifically for Performance Marketing. What is it? Gemma 3 4B distilled from Claude Opus 4.5 with expert-level marketing knowledge. Covers: 📱 Meta Ads (campaign structure, bidding, scaling, creative fatigue) 🔍 Google Ads (Quality Score, Performance Max, lead gen) 📊 Measurement (ROAS vs MER, incrementality, LTV:CAC) 🎨 Creative Strategy (hook rates, A/B testing, funnel creative) Why we built it: Generic LLMs say "optimize your targeting" — not helpful. This model gives specific frameworks like "frequency at 4.5 + CTR drop = creative fatigue, here's the fix..." Technical:
Base: Gemma 3 4B Method: QLoRA (r=64) Teacher: Claude Opus 4.5
🦅 Introducing Hawky AI H1 Mini 4B: A Domain-Specific Model for Performance Marketing
Hey HuggingFace community! 👋
We're excited to share our first open-source release: **Hawky AI H1 Mini 4B Experimental** - a Gemma 3 4B model fine-tuned specifically for Meta advertising and performance marketing strategy.
🎯 Why We Built This
At [Hawky.ai](https://hawky.ai), we build AI-powered creative intelligence tools for performance marketers. We work with major agencies (WPP, Madison, GroupM) and brands (TVS Motors, Tanishq, Bajaj Finserv) on campaign optimization.
We wanted to explore: Can a small, domain-specific model provide expert-level guidance on performance marketing?
Specifically, we focused on Meta's Andromeda algorithm - the AI system that now powers ad delivery across Facebook and Instagram. Understanding Andromeda is crucial for modern media buying, but the knowledge is scattered and constantly evolving.
🧠 What Makes This Different
Chain-of-Thought Reasoning The model doesn't just answer - it **thinks through problems** step-by-step:
Domain-specific reasoning is crucial when working with big-budget campaigns on Meta. That's why we've launched an experimental Chain-of-Thought (CoT) reasoning model for critical thinking, tailored to Meta's Andromeda algorithm-based campaign structuring and optimization.
The recent update to Meta's ad algorithm is very difficult to crack, and even the latest models struggle to keep up with it. To address this, we've created a small experimental dataset for fine-tuning models to better tackle Meta's Andromeda algorithm: Sri-Vigneshwar-DJ/hawky-ai-andromeda-dataset
Hey everyone! We’re excited to introduce our new Telegram group: https://t.me/XenArcAI
This space is built for **model builders, tech enthusiasts, and developers** who want to learn, share, and grow together. Whether you’re just starting out or already deep into AI/ML, you’ll find a supportive community ready to help with knowledge, ideas, and collaboration.
💡 Join us to: - Connect with fellow developers and AI enthusiasts - Share your projects, insights, and questions - Learn from others and contribute to a growing knowledge base
👉 If you’re interested, hop in and be part of the conversation: https://t.me/XenArcAI