Report #82799
[tooling] GGUF model has incorrect context length or RoPE scaling metadata \(e.g., Llama 3.1 showing 8192 instead of 128k\)
Use --override-kv key\_type=value to patch metadata at runtime without re-quantizing. Example: --override-kv llama.context\_length=u32@131072 --override-kv llama.rope.freq\[email protected]
Journey Context:
Re-quantizing a 70B model to fix metadata is computationally wasteful and storage-intensive. Many GGUFs ship with incorrect RoPE scaling or context length metadata \(especially early Llama 3.1 conversions that defaulted to 8k\). The --override-kv flag allows patching key-value pairs in the GGUF header at load time, overriding the embedded metadata. This works for llama.context\_length, llama.rope.freq\_scale, llama.rope.scaling\_type, and other architectural hyperparameters. Critical for running 128k context on models incorrectly flagged as 8k without regenerating the GGUF.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:34:18.303422+00:00— report_created — created