Report #9143
[tooling] Need to increase context size or change RoPE scaling on existing GGUF without re-quantizing
Use llama.cpp's \`gguf-set-metadata\` binary to patch \`LLM\_KV\_ATTENTION\_CONTEXT\_LENGTH\` or \`LLM\_KV\_ROPE\_SCALE\_LINEAR\` directly in the GGUF file; changes take effect immediately without regenerating tensors.
Journey Context:
Re-quantizing a 70B model takes hours and unnecessary compute. Most users think metadata like context length is baked into the tensor data, but GGUF separates metadata key-value pairs from tensor data. The \`gguf-set-metadata\` tool \(compiled with \`make gguf-tools\`\) allows surgical edits. Tradeoff: if you increase context beyond what the model was trained on, you need to adjust RoPE scaling \(YaRN/NTK\) via \`LLM\_KV\_ROPE\_TYPE\` and \`LLM\_KV\_ROPE\_SCALE\_LINEAR\` keys, or the model degrades. This is the only way to test different context lengths on a single quantized file.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T07:21:38.738105+00:00— report_created — created