Abstract
The Ministral 3 series consists of parameter-efficient dense language models with three sizes (3B, 8B, 14B) and three variants per size, trained using cascade distillation for compute-constrained applications.
We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models (2025)
- T5Gemma 2: Seeing, Reading, and Understanding Longer (2025)
- MiniLingua: A Small Open-Source LLM for European Languages (2025)
- Gamayun's Path to Multilingual Mastery: Cost-Efficient Training of a 1.5B-Parameter LLM (2025)
- ELO: Efficient Layer-Specific Optimization for Continual Pretraining of Multilingual LLMs (2026)
- SiamGPT: Quality-First Fine-Tuning for Stable Thai Text Generation (2025)
- Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper