Report #87636
[tooling] GGUF model rejects long context \(e.g., 128k\) despite metadata claiming support, or RoPE scaling fails
Override the GGUF metadata at load time using --override-kv llama.context\_length=131072 --override-kv llama.rope.freq\_scale=0.25 instead of converting the model again.
Journey Context:
Many GGUFs are converted with default context lengths \(e.g., 4096\) baked into metadata, even if the underlying model supports 128k via RoPE scaling. Users often resort to re-quantizing with llama.cpp's convert.py just to change metadata, wasting hours. The --override-kv flag \(available in recent llama.cpp builds\) allows runtime injection of key-value pairs into the model's hparams. Critical pairs: llama.context\_length \(int\), llama.rope.freq\_scale \(float, usually 1.0/original\_length \* new\_length\), and llama.rope.scale\_linear \(bool\). This enables dynamic context scaling without file modification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:41:01.008330+00:00— report_created — created