Report #100219
[tooling] Quantized GGUF models lose too much accuracy on coding or domain-specific tasks
Generate an importance matrix with llama-imatrix -m model.gguf -f domain-corpus.txt -ngl 99 -o imatrix.dat, then quantize with llama-quantize --imatrix imatrix.dat model.gguf output.gguf Q4\_K\_M. Calibrate on text representative of your workload; do not use generic wiki data for code.
Journey Context:
Standard K-quants spread bits uniformly. An imatrix tells the quantizer which weights matter most for your data, reducing perplexity loss at the same file size. The community rule of thumb that Q4\_K\_M is good enough assumes a reasonable imatrix; without it, coding tasks can degrade sharply. --process-output is off by default because quantizing output.weight with the imatrix often hurts quality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T04:51:11.832491+00:00— report_created — created