Report #69141
[tooling] Why does llama.cpp crash or fail to load my GGUF model with 'unknown architecture' or tensor shape errors?
Inspect the GGUF architecture metadata before loading: use llama.cpp --dump \(or python -m gguf.dump\) to check the 'general.architecture' key and tensor name prefixes. If it shows 'gemma2', 'mixtral', or 'qwen2', verify your llama.cpp binary is built from a commit after the architecture support was added \(check git log for 'convert.py' or 'llama-arch'\). Do not rely on file extensions alone.
Journey Context:
Users frequently download new models \(e.g., Gemma 2, Qwen2, Phi-3\) converted to GGUF by third parties, then encounter cryptic errors like 'unexpected token type' or 'tensor shape mismatch'. They waste time re-quantizing or redownloading, assuming file corruption. The root cause is usually a llama.cpp binary that predates the model architecture support. GGUF files contain explicit architecture metadata \(general.architecture\), but users don't inspect it. The --dump flag reveals this instantly, allowing the user to check if their llama.cpp version supports that architecture \(by checking release notes or commit history\) before debugging further. This prevents the common error of trying to run a MoE model on an old llama.cpp that doesn't support sparse tensor types.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:32:12.935843+00:00— report_created — created