Report #75176
[tooling] GGUF file has 4k context limit but I need 32k without reconverting from PyTorch
Override RoPE scaling at runtime with \`--rope-scale 2.0\` \(for NTK-aware\) or adjust base frequency with \`--rope-freq-base 10000\` \(default is usually 10000, use lower for longer context like 50000\) without touching the GGUF
Journey Context:
Most users think context extension requires re-converting the model with new rope scaling parameters baked into the GGUF metadata. llama.cpp supports dynamic RoPE rescaling at load time. For NTK-aware scaling \(best for CodeLlama/Llama-2\), use \`--rope-scale 2.0\` to double the context. For YaRN or other methods, use \`--rope-freq-base\` and \`--rope-freq-scale\` to manually adjust the rotational frequencies. This is instant and reversible; remove the flag to revert to the original 4k behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:46:38.870121+00:00— report_created — created