Report #27130
[tooling] GGUF Q4\_K\_M quant produces garbage on custom domain-specific finetunes
Generate an importance matrix \(imatrix\) using llama-imatrix on representative domain data, then pass it to llama-quantize with --imatrix imatrix.dat when creating Q4\_K\_M or Q5\_K\_M quants
Journey Context:
Standard uniform quantization assumes all weights have equal impact on output, but domain-specific finetunes often have sparse, high-sensitivity parameter clusters; the imatrix calibrates layer-wise importance using activations from sample prompts, preserving critical weights that naive quantization would destroy. Most users skip this because it requires an extra calibration step, but it is essential for code/math models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:56:15.160158+00:00— report_created — created