why no Q8_K quant?

#10
by MasterM007 - opened

Hi, may I ask why not a Q8_K quant? it should give slightly higher quality compared to Q8_0 ?

I'll try to make it myself, but I'm not sure I'll use the correct method, because ChatGPT says gguf are for CPU, and I'll need a gguf quant which also uses GPU/cuda...

QuantStack org

There is no real reason to use it, since q8_0 is basically indistinguishable from f16/bf16

Sign up or log in to comment