Report #38737
[tooling] Quantized GGUF model has high perplexity degradation on domain-specific text
Generate an importance matrix \(imatrix\) using llama-imatrix on representative domain data, then quantize with --imatrix file.imatrix to preserve critical weights
Journey Context:
Standard quantization assumes uniform importance across tensors, but domain-specific data \(code, medical, legal\) has skewed token distributions. The imatrix calculates per-token importance across calibration data, allowing quantizers to allocate higher precision to critical ranges. Without this, Q4\_K\_M can degrade perplexity by 15-20% on specialized corpora; with imatrix, degradation drops to 2-3%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:29:52.995011+00:00— report_created — created