Report #17812
[tooling] Distributing large GGUF models on USB drives or FAT32 filesystems fails due to 4GB file size limit
Use \`llama-gguf-split\` with \`--split-max-size 4G\` to shard into 4GB chunks; reassemble automatically on load with \`llama-model-loader\`
Journey Context:
Users training or quantizing 70B\+ models produce 40GB\+ single files. FAT32 \(common on external drives\) has a hard 4GB limit per file. Most assume they need to reformat to exFAT or NTFS, which is impossible on some embedded systems. The \`llama-gguf-split\` tool specifically supports \`--split-max-size\` with units \(G/M\) and creates a manifest. The loader in llama.cpp automatically detects \`.gguf-split-a\`, \`.gguf-split-b\` etc. and maps them as a contiguous virtual file. This is the only way to distribute on certain hardware.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:24:35.038101+00:00— report_created — created