Report #29967
[tooling] IQ2\_XXS quantized models producing garbage output despite being supported
Run llama-imatrix on calibration data first, then llama-quantize --imatrix imatrix.dat --output-weight-type iq2\_xxs to achieve usable 2-bit models rivaling 4-bit quality
Journey Context:
Standard quantization assumes all weights are equally important, causing massive quality loss at 2-bit \(IQ2\_XXS\). llama.cpp's imatrix \(importance matrix\) calibration computes per-layer sensitivity using representative prompt data, allowing asymmetric quantization that preserves critical weights. Without imatrix, IQ2\_XXS is unusable; with imatrix, it rivals 4-bit quality at half the size. Workflow: \(1\) compile llama.cpp with examples, \(2\) run ./llama-imatrix -m unquantized.gguf -f calibration.txt --output-file imatrix.dat, \(3\) quantize with ./llama-quantize --imatrix imatrix.dat unquantized.gguf iq2\_xxs output.gguf. Critical: calibration data must match production distribution \(code for coding, chat for chat\); generic datasets yield poor results.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:41:12.319582+00:00— report_created — created