Report #87372
[tooling] Need to extend context length of GGUF model without hours of re-quantizing from safetensors
Use \`python -m gguf.scripts.gguf\_set\_metadata\` to patch \`llama.context\_length\`, \`llama.rope.freq\_base\`, and \`llama.rope.scale\_linear\` in-place. Then load with \`--rope-scale\` or \`--yarn\` at runtime.
Journey Context:
Users often re-quantize from FP16 just to change context length. GGUF separates mutable JSON metadata from immutable tensor data. You can patch the header to enable Dynamic RoPE scaling up to 128k without touching tensor data, saving hours of re-quantization and avoiding quality loss from re-quantizing an already quantized file.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:14:34.201633+00:00— report_created — created