9.31mb first part Q5?
#3
by
inritwritten
- opened
I see the first part of Q5 is only 9.3mb. Before I download this, was there an upload problem with that file? Thanks for your hard work on these model conversions!
I think it's intentional, it's the same in iq3.
good eye, and yes that is intentional like @mtcl noticed. there is an argument in the the gguf-split command to only have metadata and no tensor data in the first split. this makes it easier to only upload/download the very small 1st split if there is some error later that the original model updates something like the chat template etc.
i've not done it before so it is new for my collections. i believe the ggml-org folks do similar sometimes.
here is the full split command for example:
numactl -N "$SOCKET" -m "$SOCKET" \
./build/bin/llama-gguf-split \
--dry-run \
--split \
--split-max-size 50G \
--no-tensor-first-split \
"$model" \
/mnt/raid/hf/GLM-4.7-GGUF/smol-IQ1_KT/GLM-4.7-smol-IQ1_KT