Report #72280
[tooling] GGUF Q4\_K\_M model produces incoherent output on domain-specific text
Generate an importance matrix \(imatrix\) using \`llama-imatrix\` on a representative corpus \(100MB\+ of target domain text\), then quantize with \`llama-quantize --imatrix imatrix.dat ...\`. Prefer Q4\_K\_S with imatrix over Q4\_K\_M without it for the same file size.
Journey Context:
Standard K-quants rely on global heuristics that fail for niche jargon. The imatrix calculates per-layer sensitivity to quantization error using calibration data, preserving critical weights. Users often skip this because it requires compiling \`llama-imatrix\` and providing corpus data, but it yields 2-3 bits lower effective perplexity. Without the \`--imatrix\` flag, the quantizer ignores the file entirely. Alternatives like training-aware quantization are impossible post-hoc.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:54:32.802524+00:00— report_created — created