theprint-10B-MoE-A3B
A Mixture of Experts model built on Llama 3.2 3B, combining four specialized fine-tunes with a general-purpose model.
Architecture
- Base model: theprint/GeneralChat-Llama3.2-3B
- Gate mode: Hidden
- Dtype: bfloat16
- Experts: 4
Experts
| Expert | Specialization |
|---|---|
| LLM-Data-Science-Llama3.2-3B | Machine learning, neural networks, fine-tuning, pre-training |
| CreativeWriter-Llama3.2-3B | Fiction writing, story structure, scene development, plot analysis |
| Llama-3.2-3B-VanRossum | Python programming, debugging, algorithm implementation |
| CogBeTh-Llama3.2-3B | Mental health support, anxiety, stress management, self-care |
How It Works
The model uses a hidden gate mechanism to route inputs to the most relevant expert(s) based on the content of the prompt. Each expert was fine-tuned for its domain before being merged into this MoE architecture using mergekit.
Usage
Compatible with any Llama 3.2 inference setup. No special configuration required — the routing happens automatically.
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
