Report #26365
[tooling] Quantized 70B model to Q4\_K\_M but quality degradation unacceptable on domain-specific code
Generate an importance matrix \(imatrix\) using ./llama-imatrix against calibration data from your domain, then quantize with ./llama-quantize --imatrix imatrix.dat, preserving critical weights and achieving Q4\_K\_M quality comparable to Q5\_K\_M without.
Journey Context:
Standard quantization treats all tensors equally, destroying domain-specific knowledge \(e.g., rare code patterns\). imatrix calculates activation-based importance using calibration data, allocating bits to preserve high-saliency weights. Process: run imatrix generation on unquantized model with ~100-1000 MB of representative text \(code, legal, medical\), then pass to quantize. Result: significant perplexity reduction vs standard Q4. Essential for RAG over specialized corpora.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:39:09.922839+00:00— report_created — created