Report #12402
[tooling] Downloaded GGUF has wrong context length \(e.g., 4k\) or RoPE scaling but model supports 128k, need to fix without re-quantizing 70B file
Use llama.cpp's --override-kv flag \(e.g., --override-kv llama.context\_length=128000,llama.rope.freq\_base=500000\) or edit metadata with gguf-py/scripts/gguf-set-metadata.py
Journey Context:
Many community GGUFs are converted with default 4k/8k context metadata despite base models supporting 128k via YaRN/RoPE scaling; re-converting 70B weights takes hours and requires original safetensors; llama.cpp supports runtime override of GGUF key-value pairs via --override-kv, accepting comma-separated key=value pairs; critical detail is that RoPE base \(freq\_base\) must scale with context \(typically 500k-1M for 128k\), and YaRN parameters \(yarn\_ext\_factor\) must also be set if the model was trained with it; alternative of editing GGUF file permanently via gguf-set-metadata.py is preferred for production to avoid CLI bloat.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T15:51:57.170685+00:00— report_created — created