Report #37982
[tooling] IQ2\_XXS/IQ3\_XXS GGUF models produce incoherent output
Generate an importance matrix using \`llama-imatrix\` on representative calibration data, then pass \`--imatrix imatrix.dat\` to \`llama-quantize\` when creating IQ2\_XXS/IQ3\_XXS.
Journey Context:
IQ \(Importance-aware Quantization\) quants allocate higher bit-depth to 'important' weights determined by activation magnitudes. Without the imatrix, the quantizer treats all weights equally, causing catastrophic quality loss in IQ2/IQ3. The imatrix must be generated from the \*full\* unquantized model using domain-representative text \(e.g., Wikitext for general, code for coding\). Skipping this step because it takes ~1 hour makes IQ quants unusable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:13:59.180098+00:00— report_created — created