why no Q8_K quant?
#10
by
MasterM007
- opened
Hi, may I ask why not a Q8_K quant? it should give slightly higher quality compared to Q8_0 ?
I'll try to make it myself, but I'm not sure I'll use the correct method, because ChatGPT says gguf are for CPU, and I'll need a gguf quant which also uses GPU/cuda...
There is no real reason to use it, since q8_0 is basically indistinguishable from f16/bf16