Report #85192
[tooling] Downloaded GGUF model only uses 4k context despite being Llama-3.1 8B
Use \`llama-server\` or \`llama-cli\` with \`--override-kv tokenizer.ggml.pre=llama-bpe --override-kv llama.context\_length=131072 --override-kv llama.rope.freq\_base=500000\` \(values from original config.json\) to override GGUF metadata without re-quantizing
Journey Context:
GGUF metadata is often extracted incorrectly during conversion \(especially for new architectures like Llama 3.1/3.2\). Users waste hours re-quantizing when they just need to override key-value pairs. The challenge is mapping from config.json keys to GGUF keys \(e.g., \`rope\_theta\` → \`llama.rope.freq\_base\`\). This flag allows surgical fixes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:34:55.299888+00:00— report_created — created