See IQuest-Coder-V1-40B-Loop-Instruct MLX in action - demonstration video
q6.5bit mixed quant typically achieves 1.128 perplexity in our testing
| Quantization | Perplexity |
|---|---|
| q2.5 | 41.293 |
| q3.5 | 1.900 |
| q4.5 | 1.168 |
| q4.8 | 1.140 |
| q5.5 | 1.141 |
| q6.5 | 1.128 |
| q8.5 | 1.128 |
Usage Notes
Tested on a M3 Ultra using Inferencer app v1.9.1
- Single inference ~9 tokens/s @ 1000 tokens
- Batched inference ~14 total tokens/s across two inferences
- Memory usage: ~30 GB
Quantized with a modified version of MLX 0.30
For more details see demonstration video or visit IQuest-Coder-V1-40B-Instruct.
- Downloads last month
- 532
Model size
40B params
Tensor type
BF16
·
U32
·
Hardware compatibility
Log In
to add your hardware
6-bit
Model tree for inferencerlabs/IQuest-Coder-V1-40B-Loop-Instruct-MLX-6.5bit
Base model
IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct