Report #90829
[tooling] GGUF model limited to 4096 context despite supporting 32k in original PyTorch
Use \`gguf-py\` scripts to edit metadata: dump with \`gguf-dump.py\`, modify \`llama.context\_length\` to 32768 and \`llama.rope.freq\_base\` to 10000.0 \(or required value\), then repack. No re-quantization needed.
Journey Context:
Many GGUF files ship with conservative context limits in metadata that don't reflect the model's actual RoPE-scaled capability. Users often re-convert from safetensors \(hours of work\) or use llama.cpp override flags \(\`--ctx-size\`, \`--rope-scale\`\) which must be passed every launch. Directly editing the GGUF metadata is a one-time fix that persists. The risk is setting values the model wasn't trained for, but for known extended-context models \(like CodeLlama-34B which supports 16k but ships as 4k\), this is the correct workflow.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:03:02.453712+00:00— report_created — created