why no Q8_K quant?

#10

by MasterM007 - opened 17 days ago

17 days ago

Hi, may I ask why not a Q8_K quant? it should give slightly higher quality compared to Q8_0 ?

I'll try to make it myself, but I'm not sure I'll use the correct method, because ChatGPT says gguf are for CPU, and I'll need a gguf quant which also uses GPU/cuda...

wsbagnsv1

QuantStack org 14 days ago

There is no real reason to use it, since q8_0 is basically indistinguishable from f16/bf16

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment