You need to update to the latest llama.cpp version
Daniel (Unsloth) PRO
danielhanchen
AI & ML interests
None yet
Recent Activity
updated
a model
about 6 hours ago
unsloth/Mistral-Large-3-675B-Base-2512
updated
a model
about 7 hours ago
unsloth/Mistral-Large-3-675B-Instruct-2512-Eagle
updated
a model
about 7 hours ago
unsloth/Mistral-Large-3-675B-Instruct-2512-NVFP4
Organizations
replied to
their
post
about 18 hours ago
posted
an
update
3 days ago
Post
2902
Mistral's new Ministral 3 models can now be Run & Fine-tuned locally! (16GB RAM)
Ministral 3 have vision support and the best-in-class performance for their sizes.
14B Instruct GGUF: unsloth/Ministral-3-14B-Instruct-2512-GGUF
14B Reasoning GGUF: unsloth/Ministral-3-14B-Reasoning-2512-GGUF
๐ฑ Step-by-step Guide: https://docs.unsloth.ai/new/ministral-3
All GGUFs, BnB, FP8 etc. variants uploads: https://huggingface.co/collections/unsloth/ministral-3
Ministral 3 have vision support and the best-in-class performance for their sizes.
14B Instruct GGUF: unsloth/Ministral-3-14B-Instruct-2512-GGUF
14B Reasoning GGUF: unsloth/Ministral-3-14B-Reasoning-2512-GGUF
๐ฑ Step-by-step Guide: https://docs.unsloth.ai/new/ministral-3
All GGUFs, BnB, FP8 etc. variants uploads: https://huggingface.co/collections/unsloth/ministral-3
posted
an
update
8 days ago
Post
8141
Qwen3-Next can now be Run locally! (30GB RAM)
Instruct GGUF: unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF
The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B.
๐ Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next
Thinking GGUF: unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF
Instruct GGUF: unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF
The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B.
๐ Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next
Thinking GGUF: unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF
posted
an
update
28 days ago
Post
4152
You can now run Kimi K2 Thinking locally with our Dynamic 1-bit GGUFs:
unsloth/Kimi-K2-Thinking-GGUF
We shrank the 1T model to 245GB (-62%) & retained ~85% of accuracy on Aider Polyglot. Run on >247GB RAM for fast inference.
We also collaborated with the Moonshot AI Kimi team on a system prompt fix! ๐ฅฐ
Guide + fix details: https://docs.unsloth.ai/models/kimi-k2-thinking-how-to-run-locally
We shrank the 1T model to 245GB (-62%) & retained ~85% of accuracy on Aider Polyglot. Run on >247GB RAM for fast inference.
We also collaborated with the Moonshot AI Kimi team on a system prompt fix! ๐ฅฐ
Guide + fix details: https://docs.unsloth.ai/models/kimi-k2-thinking-how-to-run-locally
posted
an
update
4 months ago
Post
6429
Run DeepSeek-V3.1 locally on 170GB RAM with Dynamic 1-bit GGUFs!๐
GGUFs: unsloth/DeepSeek-V3.1-GGUF
The 715GB model gets reduced to 170GB (-80% size) by smartly quantizing layers.
The 1-bit GGUF passes all our code tests & we fixed the chat template for llama.cpp supported backends.
Guide: https://docs.unsloth.ai/basics/deepseek-v3.1
GGUFs: unsloth/DeepSeek-V3.1-GGUF
The 715GB model gets reduced to 170GB (-80% size) by smartly quantizing layers.
The 1-bit GGUF passes all our code tests & we fixed the chat template for llama.cpp supported backends.
Guide: https://docs.unsloth.ai/basics/deepseek-v3.1
posted
an
update
4 months ago
Post
5487
Run OpenAI's new gpt-oss models locally with Unsloth GGUFs! ๐ฅ๐ฆฅ
20b GGUF: unsloth/gpt-oss-20b-GGUF
120b GGUF: unsloth/gpt-oss-120b-GGUF
Model will run on 14GB RAM for 20b and 66GB for 120b.
20b GGUF: unsloth/gpt-oss-20b-GGUF
120b GGUF: unsloth/gpt-oss-120b-GGUF
Model will run on 14GB RAM for 20b and 66GB for 120b.
posted
an
update
5 months ago
Post
3635
It's Qwen3 week! ๐ We uploaded Dynamic 2-bit GGUFs for:
Qwen3-Coder: unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
Qwen3-2507: unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF
So you can run them both locally!
Guides are in model cards.
Qwen3-Coder: unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
Qwen3-2507: unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF
So you can run them both locally!
Guides are in model cards.
posted
an
update
5 months ago
Post
3950
We fixed more issues! Use
* Fixed Nanonets OCR-s unsloth/Nanonets-OCR-s-GGUF
* Fixed THUDM GLM-4 unsloth/GLM-4-32B-0414-GGUF
* DeepSeek Chimera v2 is uploading! unsloth/DeepSeek-TNG-R1T2-Chimera-GGUF
--jinja for all!* Fixed Nanonets OCR-s unsloth/Nanonets-OCR-s-GGUF
* Fixed THUDM GLM-4 unsloth/GLM-4-32B-0414-GGUF
* DeepSeek Chimera v2 is uploading! unsloth/DeepSeek-TNG-R1T2-Chimera-GGUF
replied to
their
post
5 months ago
Thank you!
posted
an
update
5 months ago
Post
3163
Gemma 3n finetuning is now 1.5x faster and uses 50% less VRAM in Unsloth!
Click "Use this model" and click "Google Colab"!
unsloth/gemma-3n-E4B-it
unsloth/gemma-3n-E2B-it
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_(4B)-Conversational.ipynb
Click "Use this model" and click "Google Colab"!
unsloth/gemma-3n-E4B-it
unsloth/gemma-3n-E2B-it
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_(4B)-Conversational.ipynb
posted
an
update
6 months ago
Post
1305
We updated lots of our GGUFs and uploaded many new ones!
* unsloth/dots.llm1.inst-GGUF
* unsloth/Jan-nano-GGUF
* unsloth/Nanonets-OCR-s-GGUF
* Updated and fixed Q8_0 upload for unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF
* Added Q2_K_XL for unsloth/DeepSeek-R1-0528-GGUF
* Updated and fixed Vision support for unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
* unsloth/dots.llm1.inst-GGUF
* unsloth/Jan-nano-GGUF
* unsloth/Nanonets-OCR-s-GGUF
* Updated and fixed Q8_0 upload for unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF
* Added Q2_K_XL for unsloth/DeepSeek-R1-0528-GGUF
* Updated and fixed Vision support for unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
posted
an
update
6 months ago
Post
2510
Mistral releases Magistral, their new reasoning models! ๐ฅ
GGUFs to run: unsloth/Magistral-Small-2506-GGUF
Magistral-Small-2506 excels at mathematics and coding.
You can run the 24B model locally with just 32GB RAM by using our Dynamic GGUFs.
GGUFs to run: unsloth/Magistral-Small-2506-GGUF
Magistral-Small-2506 excels at mathematics and coding.
You can run the 24B model locally with just 32GB RAM by using our Dynamic GGUFs.
posted
an
update
6 months ago
Post
3846
New DeepSeek-R1-0528 1.65-bit Dynamic GGUF!
Run the model locally even easier! Will fit on a 192GB Macbook and run at 7 tokens/s.
DeepSeek-R1-0528 GGUFs: unsloth/DeepSeek-R1-0528-GGUF
Qwen3-8B DeepSeek-R1-0528 GGUFs: unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF
And read our Guide: https://docs.unsloth.ai/basics/deepseek-r1-0528
Run the model locally even easier! Will fit on a 192GB Macbook and run at 7 tokens/s.
DeepSeek-R1-0528 GGUFs: unsloth/DeepSeek-R1-0528-GGUF
Qwen3-8B DeepSeek-R1-0528 GGUFs: unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF
And read our Guide: https://docs.unsloth.ai/basics/deepseek-r1-0528
posted
an
update
7 months ago
Post
2323
๐ Qwen3 128K Context Length: We've released Dynamic 2.0 GGUFs + 4-bit safetensors!
Fixed: Now works on any inference engine and fixed issues with the chat template.
Qwen3 GGUFs:
30B-A3B: unsloth/Qwen3-30B-A3B-GGUF
235-A22B: unsloth/Qwen3-235B-A22B-GGUF
32B: unsloth/Qwen3-32B-GGUF
Read our guide on running Qwen3 here: https://docs.unsloth.ai/basics/qwen3-how-to-run-and-finetune
128K Context Length:
30B-A3B: unsloth/Qwen3-30B-A3B-128K-GGUF
235-A22B: unsloth/Qwen3-235B-A22B-128K-GGUF
32B: unsloth/Qwen3-32B-128K-GGUF
All Qwen3 uploads: unsloth/qwen3-680edabfb790c8c34a242f95
Fixed: Now works on any inference engine and fixed issues with the chat template.
Qwen3 GGUFs:
30B-A3B: unsloth/Qwen3-30B-A3B-GGUF
235-A22B: unsloth/Qwen3-235B-A22B-GGUF
32B: unsloth/Qwen3-32B-GGUF
Read our guide on running Qwen3 here: https://docs.unsloth.ai/basics/qwen3-how-to-run-and-finetune
128K Context Length:
30B-A3B: unsloth/Qwen3-30B-A3B-128K-GGUF
235-A22B: unsloth/Qwen3-235B-A22B-128K-GGUF
32B: unsloth/Qwen3-32B-128K-GGUF
All Qwen3 uploads: unsloth/qwen3-680edabfb790c8c34a242f95
posted
an
update
8 months ago
Post
6251
๐ฆฅ Introducing Unsloth Dynamic v2.0 GGUFs!
Our v2.0 quants set new benchmarks on 5-shot MMLU and KL Divergence, meaning you can now run & fine-tune quantized LLMs while preserving as much accuracy as possible.
Llama 4: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
DeepSeek-R1: unsloth/DeepSeek-R1-GGUF-UD
Gemma 3: unsloth/gemma-3-27b-it-GGUF
We made selective layer quantization much smarter. Instead of modifying only a subset of layers, we now dynamically quantize all layers so every layer has a different bit. Now, our dynamic method can be applied to all LLM architectures, not just MoE's.
Blog with Details: https://docs.unsloth.ai/basics/dynamic-v2.0
All our future GGUF uploads will leverage Dynamic 2.0 and our hand curated 300Kโ1.5M token calibration dataset to improve conversational chat performance.
For accurate benchmarking, we built an evaluation framework to match the reported 5-shot MMLU scores of Llama 4 and Gemma 3. This allowed apples-to-apples comparisons between full-precision vs. Dynamic v2.0, QAT and standard iMatrix quants.
Dynamic v2.0 aims to minimize the performance gap between full-precision models and their quantized counterparts.
Our v2.0 quants set new benchmarks on 5-shot MMLU and KL Divergence, meaning you can now run & fine-tune quantized LLMs while preserving as much accuracy as possible.
Llama 4: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
DeepSeek-R1: unsloth/DeepSeek-R1-GGUF-UD
Gemma 3: unsloth/gemma-3-27b-it-GGUF
We made selective layer quantization much smarter. Instead of modifying only a subset of layers, we now dynamically quantize all layers so every layer has a different bit. Now, our dynamic method can be applied to all LLM architectures, not just MoE's.
Blog with Details: https://docs.unsloth.ai/basics/dynamic-v2.0
All our future GGUF uploads will leverage Dynamic 2.0 and our hand curated 300Kโ1.5M token calibration dataset to improve conversational chat performance.
For accurate benchmarking, we built an evaluation framework to match the reported 5-shot MMLU scores of Llama 4 and Gemma 3. This allowed apples-to-apples comparisons between full-precision vs. Dynamic v2.0, QAT and standard iMatrix quants.
Dynamic v2.0 aims to minimize the performance gap between full-precision models and their quantized counterparts.
posted
an
update
8 months ago
Post
5233
You can now run Llama 4 on your own local device! ๐ฆ
Run our Dynamic 1.78-bit and 2.71-bit Llama 4 GGUFs:
unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
You can run them on llama.cpp and other inference engines. See our guide here: https://docs.unsloth.ai/basics/tutorial-how-to-run-and-fine-tune-llama-4
Run our Dynamic 1.78-bit and 2.71-bit Llama 4 GGUFs:
unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
You can run them on llama.cpp and other inference engines. See our guide here: https://docs.unsloth.ai/basics/tutorial-how-to-run-and-fine-tune-llama-4
posted
an
update
8 months ago
Post
3675
You can now run DeepSeek-V3-0324 on your own local device!
Run our Dynamic 2.42 and 2.71-bit DeepSeek GGUFs: unsloth/DeepSeek-V3-0324-GGUF
You can run them on llama.cpp and other inference engines. See our guide here: https://docs.unsloth.ai/basics/tutorial-how-to-run-deepseek-v3-0324-locally
Run our Dynamic 2.42 and 2.71-bit DeepSeek GGUFs: unsloth/DeepSeek-V3-0324-GGUF
You can run them on llama.cpp and other inference engines. See our guide here: https://docs.unsloth.ai/basics/tutorial-how-to-run-deepseek-v3-0324-locally
posted
an
update
11 months ago
Post
3286
I uploaded DeepSeek R1 GGUFs!
unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF
unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF
2bit for MoE: unsloth/DeepSeek-R1-GGUF
unsloth/DeepSeek-R1-Zero-GGUF
More at unsloth/deepseek-r1-all-versions-678e1c48f5d2fce87892ace5
unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF
unsloth/DeepSeek-R1-Distill-Llama-70B-GGUF
2bit for MoE: unsloth/DeepSeek-R1-GGUF
unsloth/DeepSeek-R1-Zero-GGUF
More at unsloth/deepseek-r1-all-versions-678e1c48f5d2fce87892ace5
posted
an
update
11 months ago
Post
4870
We fixed many bugs in Phi-4 & uploaded fixed GGUF + 4-bit versions! โจ
Our fixed versions are even higher on the Open LLM Leaderboard than Microsoft's!
GGUFs: unsloth/phi-4-GGUF
Dynamic 4-bit: unsloth/phi-4-unsloth-bnb-4bit
You can also now finetune Phi-4 for free on Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb
Read our blogpost for more details on bug fixes etc: https://unsloth.ai/blog/phi4
Our fixed versions are even higher on the Open LLM Leaderboard than Microsoft's!
GGUFs: unsloth/phi-4-GGUF
Dynamic 4-bit: unsloth/phi-4-unsloth-bnb-4bit
You can also now finetune Phi-4 for free on Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb
Read our blogpost for more details on bug fixes etc: https://unsloth.ai/blog/phi4