Report #13310
[tooling] Unexpected VRAM usage or quality degradation despite 'Q4\_K\_M' filename in GGUF
Run \`gguf-dump --json model.gguf \| jq '.general.architecture, .quantization.type'\` to inspect actual per-tensor quantization types; verify \`ftype\` matches filename claims before loading
Journey Context:
HuggingFace repos often contain mislabeled GGUFs \(e.g., claiming Q4\_K\_M but containing Q4\_0 or mixed Q5\). llama.cpp loads them silently, causing either memory blow-up \(if higher bits\) or perplexity spikes \(if poorly mixed\). gguf-dump reveals the actual quantization schema in the metadata. This prevents hours of debugging 'why does 70B Q4 use 48GB VRAM' when the file is actually Q5\_K\_M. Alternative: trust MD5 hashes from repo \(unreliable if uploader made mistakes\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:21:38.075550+00:00— report_created — created