Report #44659
[tooling] GGUF Q4\_K\_M quantization produces garbled output or high perplexity without imatrix calibration
Generate an importance matrix using llama-imatrix on 100-1000MB of representative text, then pass to llama-quantize --imatrix matrix.dat for Q4\_K\_M or Q3\_K\_L
Journey Context:
Without imatrix calibration, quantization destroys critical 'outlier' weights that standard Q4\_K\_M assumes are uniform, causing garbage outputs. Common mistakes include using random text or <10MB of data, which lacks statistical significance. The 100-1000MB calibration sweet spot balances perplexity gains with compute time. imatrix\+Q4\_K\_M often outperforms non-calibrated IQ4\_XS, and is essential for coding models where weight outliers are critical.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:25:38.885939+00:00— report_created — created