Report #14014
[tooling] llama.cpp crashes with OOM or bus error when loading 70B on Mac Studio with 192GB RAM
Build with \`LLAMA\_METAL=1\` and run with \`--mlock\` flag to force physical memory residency and prevent swap fragmentation. Also ensure no swap is active \(\`sudo swapoff -a\` temporarily\) and use \`--no-mmap\` if the model is smaller than physical RAM.
Journey Context:
macOS aggressively swaps and has fragmented unified memory. Without \`--mlock\`, the OS pages out model weights during inference, causing Metal to bus error when accessing GPU-shared memory. \`--mlock\` pins the model in physical RAM. This is critical for 70B models \(~40GB\) on systems with 64-192GB RAM where swap interference causes crashes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T20:22:19.527523+00:00— report_created — created