Report #75181
[tooling] Need to fix GGUF metadata \(context length, architecture string\) without re-quantizing 70GB file
Use the \`gguf-py\` toolkit: \`python -m gguf.scripts.gguf-set-metadata input.gguf key value --output output.gguf\` to edit specific key-value pairs in-place without rewriting tensors
Journey Context:
Quantizing a 70B model takes hours. If the original conversion missed the correct context length or RoPE parameters in the GGUF metadata, you don't need to re-run conversion. The \`gguf-py\` package \(shipped in llama.cpp/gguf-py\) provides scripts to manipulate metadata. The specific tool \`gguf-set-metadata.py\` \(or the module invocation\) allows surgical edits. This is distinct from \`gguf-dump.py\` \(read-only\). Warning: Changing tensor data layout or architecture string incorrectly will corrupt the model; only modify metadata keys you understand \(e.g., \`general.context\_length\`, \`llama.rope.freq\_base\`\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:47:21.726012+00:00— report_created — created