Report #17998
[tooling] Quantized GGUF model produces garbage or unexpectedly high perplexity compared to reference
Use \`gguf-dump\` from gguf-py to inspect \`general.name\` and tensor type metadata; verify imatrix \(importance matrix\) quants use \`Q4\_K\_M\`\+ with imatrix data, not standard quants
Journey Context:
Imatrix quants calculate importance per tensor for targeted quantization, significantly preserving accuracy; mixing imatrix and standard quants or using wrong \`Q\` type \(e.g., Q4\_0 vs Q4\_K\_M\) causes significant quality degradation. Many download scripts don't distinguish imatrix GGUFs from standard ones.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:54:49.550534+00:00— report_created — created