Instructions to use magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF",
	filename="Apriel-1.5-15b-Thinker-IQ4_NL-EQKOUD-IQ4NL-H-MXFP4.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL
# Run inference directly in the terminal:
llama-cli -hf magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL
# Run inference directly in the terminal:
llama-cli -hf magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL
# Run inference directly in the terminal:
./llama-cli -hf magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL
# Run inference directly in the terminal:
./build/bin/llama-cli -hf magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL

Use Docker

docker model run hf.co/magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL

LM Studio
Jan

vLLM

How to use magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL

Ollama
How to use magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF with Ollama:
```
ollama run hf.co/magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL
```

Unsloth Studio new

How to use magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF to start chatting

Pi new

How to use magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL

Run Hermes

hermes

Docker Model Runner
How to use magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF with Docker Model Runner:
```
docker model run hf.co/magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL
```

Lemonade

How to use magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull magiccodingman/Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF:IQ4_NL

Run and chat with the model

lemonade run user.Apriel-1.5-15b-Thinker-unsloth-MagicQuant-Hybrid-GGUF-IQ4_NL

List all available models

lemonade list

MagicQuant GGUF Hybrids - Apriel 1.5 15b Thinker

(DEPRECIATED - Part of MagicQuant v1.0 which had significant flaws. Please utilize v2.0 which is production ready)

MagicQuant is an automated quantization, benchmarking, and evolutionary hybrid-GGUF search system for LLMs.

Each release includes models optimized to outperform standard baseline quants (Q8, Q6, Q5, Q4). If a baseline GGUF exists in this repo, the evolutionary engine couldn’t beat it. If a baseline is missing, it’s because a hybrid configuration outperformed it so completely that including the baseline would've been pointless.

These hybrid GGUFs are built to be as small, fast, and low-drift as possible while preserving model capability.

To dive deeper into how MagicQuant works, see the main repo: MagicQuant on GitHub (by MagicCodingMan)

Notes:

The HuggingFace hardware compatibility where it shows the bits is usually wrong. It doesn't understand hybrid mixes, so don't trust it.
Naming scheme can be found on the MagicQuant Wiki.
(tips) Less precision loss means less brain damage. More TPS means faster! Smaller is always better right?

Precision Loss Guide

0–0.1% → God-tier, scientifically exact
0.1–1% → True near-lossless, agent-ready
1–3% → Minimal loss, great for personal use
3–5% → Borderline, but still functional
5%+ → Toys, not tools, outside MagicQuant’s scope

Learn more about precision loss here.

Table - File Size + TPS + Avg Precision Loss

model_name	file_size_gb	bench_tps	avg_prec_loss
IQ4_NL-EQKOUD-IQ4NL-H-Q5K	7.65	103.32	0.2032%
mxfp4_moe-EHQKOUD-IQ4NL	7.57	118.12	0.2600%
IQ4_NL-EQKOUD-IQ4NL-H-MXFP4	7.55	98.73	0.8938%
IQ4_NL-EQKUD-IQ4NL-HO-MXFP4	7.52	111.12	1.5309%

Table - PPL Columns

model_name	gen	gen_er	code	code_er	math	math_er
IQ4_NL-EQKOUD-IQ4NL-H-Q5K	10.9722	0.2904	1.7539	0.0147	9.5560	0.2444
mxfp4_moe-EHQKOUD-IQ4NL	10.9975	0.2905	1.7551	0.0147	9.6002	0.2456
IQ4_NL-EQKOUD-IQ4NL-H-MXFP4	11.2165	0.2950	1.7563	0.0146	9.5709	0.2427
IQ4_NL-EQKUD-IQ4NL-HO-MXFP4	11.2991	0.2972	1.7638	0.0147	9.5010	0.2397

Table - Precision Loss Columns

model_name	loss_general	loss_code	loss_math
IQ4_NL-EQKOUD-IQ4NL-H-Q5K	0.0674	0.3146	0.2276
mxfp4_moe-EHQKOUD-IQ4NL	0.1630	0.3832	0.2339
IQ4_NL-EQKOUD-IQ4NL-H-MXFP4	2.1576	0.4518	0.0720
IQ4_NL-EQKUD-IQ4NL-HO-MXFP4	2.9099	0.8808	0.8019

Baseline Models (Reference)

Table - File Size + TPS + Avg Precision Loss

model_name	file_size_gb	bench_tps	avg_prec_loss
BF16	26.88	40.48	0.0000%
Q8_0	14.29	57.55	0.2909%
Q6_K	11.03	76.32	0.5890%
Q5_K	9.56	74.51	0.7029%
Q4_K_M	8.18	87.67	1.0006%
IQ4_NL	7.81	93.13	2.7508%
MXFP4_MOE	7.15	75.44	11.8277%

Table - PPL Columns

model_name	gen	gen_er	code	code_er	math	math_er
BF16	10.9796	0.2924	1.7484	0.0148	9.5778	0.2441
Q8_0	11.0342	0.2942	1.7481	0.0148	9.6121	0.2453
Q6_K	11.0505	0.2947	1.7507	0.0148	9.4830	0.2403
Q5_K	11.1729	0.2987	1.7529	0.0148	9.5865	0.2440
Q4_K_M	11.2224	0.3006	1.7589	0.0149	9.5960	0.2443
IQ4_NL	11.4168	0.3060	1.7616	0.0149	9.9145	0.2571
MXFP4_MOE	13.6045	0.3840	1.8160	0.0159	10.3162	0.2712

Table - Precision Loss Columns

model_name	loss_general	loss_code	loss_math
BF16	0.0000	0.0000	0.0000
Q8_0	0.4973	0.0172	0.3581
Q6_K	0.6457	0.1315	0.9898
Q5_K	1.7605	0.2574	0.0908
Q4_K_M	2.2114	0.6005	0.1900
IQ4_NL	3.9819	0.7550	3.5154
MXFP4_MOE	23.9071	3.8664	7.7095

Support

I’m a solo developer working full time for myself to achieve my dream, pouring nights and weekends into open protocols and tools that I hope make the world a little better. If you chip in, you're helping me keep the lights on while I keep shipping.

Click here to see ways to support - BTC, Paypal, GitHub sponsors.

Or, just drop a like on the repo :)