Report #6907
[tooling] llama.cpp not loading split GGUF files \(model-00001-of-00004.gguf\) correctly
Name split GGUF files with the exact pattern 'modelname-00001-of-00004.gguf' \(zero-padded to 5 digits\) and place all parts in same directory; llama.cpp automatically detects and loads the split via llama\_model\_load without needing --tensor-split or manual concatenation. Do not rename the shards after splitting.
Journey Context:
When quantizing 70B\+ models, GGUF files exceed 50GB \(FAT32 limits\) or user preference for chunked downloads. The llama.cpp loader has specific regex for split detection: \`.\*-\[0-9\]\{5\}-of-\[0-9\]\{5\}\\.gguf$\`. Common errors: not zero-padding \(0001 vs 00001\), using underscores instead of hyphens, renaming shards \(breaking the auto-detection\), or thinking --tensor-split \(which is for multi-GPU\) is needed for multi-file. The auto-detection saves preprocessing steps and allows direct loading of HuggingFace-style sharded GGUFs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T01:18:55.173206+00:00— report_created — created