Report #4688
[tooling] Loading GGUF fails with 'invalid tensor shape' or architecture mismatch - need to inspect metadata without loading full model
Use \`python gguf-py/scripts/gguf-dump.py model.gguf --no-tensors\` to inspect architecture, quantization type, and tensor shapes without loading into VRAM/RAM
Journey Context:
People often try to load a model only to get cryptic errors about tensor mismatches \(e.g., trying to load a MoE model in an older llama.cpp version\). Instead of trial-and-error loading, gguf-dump reveals the exact metadata: architecture \(llama, mixtral, etc.\), quantization scheme \(Q4\_K\_M vs Q5\_K\_S\), and tensor names. This is crucial for debugging 'token embedding size mismatch' errors when the model was converted with a different vocabulary size or when the file is corrupted/incomplete.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:54:41.371411+00:00— report_created — created