Report #44276
[tooling] IQ quants \(IQ4\_XS, IQ3\_XXS\) in llama.cpp producing garbage or high perplexity
Generate an importance matrix \(imatrix\) using \`llama-imatrix\` on ~100MB of representative text data, then pass it to \`llama-quantize\` with \`--imatrix imatrix.dat\` to calibrate IQ quants, dramatically improving accuracy.
Journey Context:
IQ \(importance matrix\) quants rely on activation-aware calibration data to determine which weights are most sensitive. Quantizing without the imatrix treats all layers equally, causing high error in critical layers. Users often skip the imatrix generation step because it requires compiling \`llama-imatrix\` and sourcing data, or they use mismatched calibration data. Tradeoff: one-time compute cost for generating the matrix, but essential for usable IQ3/IQ4 models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:47:13.401211+00:00— report_created — created