---
library_name: transformers
license: apache-2.0
tags:
- math
- reasoning
- text-generation
- ads
- distillation
language:
- en
pipeline_tag: text-generation
base_model:
- NoesisLab/Kai-3B-Instruct
---
This is a Static quantization of [NoesisLab/Kai-3B-Instruct](https://huggingface.co/NoesisLab/Kai-3B-Instruct), made by [SimplySara](https://huggingface.co/SimplySara)

Note from NoesisLab "Due to the ADS distillation method, this model is highly sensitive to quantization noise. Q8_0 or Q6_K are strongly recommended for preserving both logical integrity and conversational alignment. Q4 variants may exhibit template collapse."

| Model                             |   Size_GB |   BPW |    PPL_Q |   KLD_Mean |   KLD_Max | Top_P_Match   |
|:----------------------------------|----------:|------:|---------:|-----------:|----------:|:--------------|
| Kai-3B-Instruct-BF16.gguf         |     5.735 | 16.02 |  12.2614 |  -1.2e-05  |  4e-06    | 100.000%      |
| Kai-3B-Instruct-MXFP4_MOE.gguf    |     3.051 |  8.52 |  12.268  |   0.001919 |  0.161748 | 97.288%       |
| Kai-3B-Instruct-i1-MXFP4_MOE.gguf |     3.051 |  8.52 |  12.268  |   0.001919 |  0.161748 | 97.288%       |
| Kai-3B-Instruct-Q8_0.gguf         |     3.051 |  8.52 |  12.268  |   0.001919 |  0.161748 | 97.288%       |
| Kai-3B-Instruct-i1-Q8_0.gguf      |     3.051 |  8.52 |  12.268  |   0.001919 |  0.161748 | 97.288%       |
| Kai-3B-Instruct-Q6_K.gguf         |     2.357 |  6.58 |  12.3055 |   0.009404 |  0.366649 | 94.435%       |
| Kai-3B-Instruct-i1-Q6_K.gguf      |     2.357 |  6.58 |  12.3486 |   0.008842 |  0.528699 | 94.605%       |
| Kai-3B-Instruct-Q5_1.gguf         |     2.173 |  6.07 |  12.4607 |   0.022546 |  1.62058  | 92.336%       |
| Kai-3B-Instruct-i1-Q5_1.gguf      |     2.173 |  6.07 |  12.3913 |   0.015555 |  0.887861 | 93.164%       |
| Kai-3B-Instruct-Q5_K_M.gguf       |     2.062 |  5.76 |  12.3932 |   0.015953 |  2.06684  | 93.315%       |
| Kai-3B-Instruct-i1-Q5_K_M.gguf    |     2.062 |  5.76 |  12.3974 |   0.014712 |  1.21054  | 93.344%       |
| Kai-3B-Instruct-i1-Q5_0.gguf      |     2.014 |  5.63 |  12.3845 |   0.018582 |  1.7811   | 92.676%       |
| Kai-3B-Instruct-Q5_K_S.gguf       |     2.009 |  5.61 |  12.4705 |   0.021112 |  2.25188  | 92.477%       |
| Kai-3B-Instruct-i1-Q5_K_S.gguf    |     2.009 |  5.61 |  12.422  |   0.016098 |  1.02742  | 93.198%       |
| Kai-3B-Instruct-Q5_0.gguf         |     2.009 |  5.61 |  12.5354 |   0.024549 |  2.64757  | 91.846%       |
| Kai-3B-Instruct-i1-Q4_1.gguf      |     1.845 |  5.16 |  12.6693 |   0.039282 |  2.17269  | 90.104%       |
| Kai-3B-Instruct-Q4_1.gguf         |     1.845 |  5.16 |  12.8411 |   0.070893 |  9.75963  | 87.274%       |
| Kai-3B-Instruct-i1-Q4_K_M.gguf    |     1.784 |  4.98 |  12.562  |   0.033791 |  2.37929  | 90.693%       |
| Kai-3B-Instruct-Q4_K_M.gguf       |     1.784 |  4.98 |  12.5551 |   0.039329 |  8.08951  | 90.011%       |
| Kai-3B-Instruct-IQ4_NL.gguf       |     1.697 |  4.74 |  12.6349 |   0.04746  |  3.75837  | 89.164%       |
| Kai-3B-Instruct-Q4_K_S.gguf       |     1.693 |  4.73 |  12.6881 |   0.050317 |  7.15421  | 88.889%       |
| Kai-3B-Instruct-i1-Q4_K_S.gguf    |     1.693 |  4.73 |  12.672  |   0.038976 |  2.35062  | 90.141%       |
| Kai-3B-Instruct-i1-Q4_0.gguf      |     1.687 |  4.71 |  12.9318 |   0.056914 |  4.90942  | 88.242%       |
| Kai-3B-Instruct-i1-IQ4_NL.gguf    |     1.686 |  4.71 |  12.7029 |   0.041041 |  2.82814  | 89.995%       |
| Kai-3B-Instruct-Q4_0.gguf         |     1.682 |  4.7  |  13.1831 |   0.079359 |  5.30813  | 86.546%       |
| Kai-3B-Instruct-IQ4_XS.gguf       |     1.619 |  4.52 |  12.6642 |   0.048527 |  3.11693  | 89.010%       |
| Kai-3B-Instruct-i1-IQ4_XS.gguf    |     1.605 |  4.48 |  12.7351 |   0.042119 |  2.81661  | 89.976%       |
| Kai-3B-Instruct-Q3_K_L.gguf       |     1.574 |  4.4  |  13.2229 |   0.095355 |  8.63835  | 85.518%       |
| Kai-3B-Instruct-i1-Q3_K_L.gguf    |     1.574 |  4.4  |  13.2477 |   0.084668 |  5.71143  | 86.163%       |
| Kai-3B-Instruct-Q3_K_M.gguf       |     1.463 |  4.09 |  13.3455 |   0.112669 |  9.19842  | 84.135%       |
| Kai-3B-Instruct-i1-Q3_K_M.gguf    |     1.463 |  4.09 |  13.4095 |   0.095939 |  7.93677  | 85.368%       |
| Kai-3B-Instruct-i1-IQ3_M.gguf     |     1.368 |  3.82 |  13.1481 |   0.112437 |  6.45799  | 84.307%       |
| Kai-3B-Instruct-IQ3_M.gguf        |     1.368 |  3.82 |  14.5693 |   0.246713 |  7.29781  | 77.711%       |
| Kai-3B-Instruct-IQ3_S.gguf        |     1.339 |  3.74 |  20.2851 |   0.623557 | 14.9444   | 66.169%       |
| Kai-3B-Instruct-i1-IQ3_S.gguf     |     1.339 |  3.74 |  13.2823 |   0.120975 |  6.12451  | 83.724%       |
| Kai-3B-Instruct-i1-Q3_K_S.gguf    |     1.334 |  3.73 |  14.4279 |   0.196396 | 11.9249   | 79.536%       |
| Kai-3B-Instruct-Q3_K_S.gguf       |     1.334 |  3.73 |  14.5753 |   0.20947  | 10.2762   | 79.235%       |
| Kai-3B-Instruct-i1-IQ3_XS.gguf    |     1.277 |  3.57 |  13.5713 |   0.149838 |  5.19091  | 81.978%       |
| Kai-3B-Instruct-i1-IQ3_XXS.gguf   |     1.181 |  3.3  |  14.4968 |   0.218333 |  7.41132  | 78.317%       |
| Kai-3B-Instruct-i1-Q2_K.gguf      |     1.167 |  3.26 |  17.0515 |   0.362859 | 13.7054   | 73.511%       |
| Kai-3B-Instruct-Q2_K.gguf         |     1.167 |  3.26 |  18.421  |   0.471699 | 10.9955   | 70.276%       |
| Kai-3B-Instruct-i1-Q2_K_S.gguf    |     1.096 |  3.06 |  19.0203 |   0.47105  |  9.39981  | 70.322%       |
| Kai-3B-Instruct-i1-IQ2_M.gguf     |     1.048 |  2.93 |  16.8179 |   0.377914 |  8.06048  | 72.505%       |
| Kai-3B-Instruct-i1-IQ2_S.gguf     |     0.974 |  2.72 |  18.9657 |   0.507571 | 10.146    | 68.855%       |
| Kai-3B-Instruct-i1-IQ2_XS.gguf    |     0.946 |  2.64 |  20.7434 |   0.60263  | 12.2848   | 66.248%       |
| Kai-3B-Instruct-i1-IQ2_XXS.gguf   |     0.868 |  2.42 |  28.0716 |   0.912772 | 20.8551   | 59.005%       |
| Kai-3B-Instruct-i1-IQ1_M.gguf     |     0.776 |  2.17 |  56.0938 |   1.71797  | 16.7686   | 46.262%       |
| Kai-3B-Instruct-i1-IQ1_S.gguf     |     0.72  |  2.01 | 142.119  |   2.71244  | 23.1949   | 35.970%       |


---
# Kai-3B-Instruct

A 3B-parameter instruction-tuned language model optimized for reasoning, math, and code generation tasks, powered by our new **ADS (Adaptive Dual-Search Distillation)** technique.

## Model Details

| | |
|---|---|
| **Model** | Kai-3B-Instruct |
| **Architecture** | SmolLM3ForCausalLM |
| **Parameters** | 3B |
| **Hidden size** | 2048 |
| **Intermediate size** | 11008 |
| **Layers** | 36 |
| **Attention heads** | 16 (4 KV heads, GQA) |
| **Context length** | 65536 |
| **Precision** | bfloat16 |
| **Vocab size** | 128,256 |

## What is ADS?

**Adaptive Dual-Search Distillation** treats model fine-tuning as a constrained optimization problem inspired by Operations Research. The core mechanism is a dynamic loss function with a stateful dual penalty factor that adapts based on embedding space entropy — forcing the model to converge to high-confidence predictions at difficult reasoning points, without modifying the model architecture.

## Benchmark Results

![Performance Comparison Across General, Code, and Math Benchmarks](model_comparison.png)

### General (5-shot, log-likelihood)

| Model | Params | MMLU | ARC-c (acc_norm) | HellaSwag (acc_norm) | PIQA (acc_norm) |
|---|:---:|:---:|:---:|:---:|:---:|
| TinyLlama | 1.1B | ~26.0% | ~33.0% | ~60.0% | ~71.0% |
| SmolLM2 | 1.7B | ~35.0% | ~38.0% | ~65.0% | ~74.0% |
| Llama-2-7B | 7B | 45.3% | 46.2% | 77.2% | 79.8% |
| Gemma-2-2B | 2.6B | ~52.0% | ~53.0% | 75.0% | ~78.0% |
| **Kai-3B-Instruct** | **3B** | **53.62%** | **51.88%** | **69.53%** | **77.53%** |
| Qwen2.5-3B | 3B | ~63.0% | ~55.0% | ~73.0% | ~80.0% |

## Code Generation — HumanEval (Pass@1, 0-shot)

| Model | Params | HumanEval (Pass@1) | Notes |
|---|:---:|:---:|---|
| Llama-2-7B | 7B | ~12.8% | 3x overtake — smaller model, far better code |
| SmolLM2-1.7B | 1.7B | ~25.0% | ADS delivers +14pp pure gain |
| Gemma-2-2B | 2B | ~30.0% | Surpasses Google's heavily distilled 2B flagship |
| **Kai-3B-Instruct** | **3B** | **39.02%** | **ADS topological pruning, full pipeline** |
| GPT-3.5 (Legacy) | 175B | ~48.0% | Kai-3B trails the original GPT-3.5 by only ~9pp |

## Math — GSM8K (0-shot)

| Model | Params | GSM8K (exact_match) |
|---|:---:|:---:|
| **Kai-3B-Instruct** | **3B** | **39.27%** |

### Key Observations

1. **Surpasses Llama-2-7B**: Kai-3B outperforms Llama-2-7B on MMLU (+8.3pp) and ARC-Challenge (+5.7pp) with less than half the parameters — a 7B model decisively beaten by a 3B distilled model.

2. **Competitive with Gemma-2-2B**: Matches or exceeds Google's Gemma-2-2B on MMLU (+1.6pp) and PIQA, despite Gemma being trained with significantly more compute.

3. **HellaSwag**: At **69.53%**, Kai-3B surpasses all sub-2B models by a wide margin and trails the compute-heavy Qwen2.5-3B by only ~3.5pp.

4. **PIQA**: At **77.53%**, Kai-3B nearly matches Gemma-2-2B (~78.0%) and approaches the 3B-class ceiling set by Qwen2.5-3B (~80.0%).

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "NoesisLab/Kai-3B-Instruct",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("NoesisLab/Kai-3B-Instruct")

messages = [{"role": "user", "content": "What is 25 * 4?"}]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
output = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

## Citation

```bibtex
@misc{noesislab2026kai3b,
  title={Kai-3B-Instruct},
  author={NoesisLab},
  year={2026},
  url={https://huggingface.co/NoesisLab/Kai-3B-Instruct}
}
```

## License

Apache 2.0