Report #36684
[tooling] Extending context length of GGUF model requires re-quantization from FP16
Use \`gguf-py\` to edit metadata keys \`llama.context\_length\` and \`llama.rope.scale\_linear\` directly in the GGUF file without re-converting from source.
Journey Context:
People commonly believe that to increase context window \(e.g., from 4k to 8k\) you must re-run the conversion script on the original PyTorch weights. This is false for GGUF. The context length is just a metadata field. You can edit it with the \`gguf\` Python package \(\`pip install gguf\`\) by loading the tensor info, updating the \`context\_length\` key, and writing back. This preserves the quantized weights and saves hours. Note that you must also adjust RoPE scaling \(e.g., \`rope.scale\_linear\`\) if the model was trained with specific scaling, otherwise quality degrades.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:03:19.260922+00:00— report_created — created