IBM Granite 4-h Tiny with its MLP expert layers quantized with BitsandBytes, saved in the custom StoneBnB format to enable VRAM-efficient training.

(While this quant can be technically used for Transformers inference, it is not supported by any commmon server and the GGUF quants are probably much better. StoneBnB is intended for fine-tuning, producing an adapter that you can merge into the unquantized model)

Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ramendik/granite-4.0-h-tiny-stonebnb

Finetuned
(9)
this model