Report #44277
[tooling] llama.cpp on Mac \(Metal\) crashing with OOM or severe slowdown with large models
Use \`--mmap\` \(default\) and explicitly avoid \`--mlock\` on macOS with unified memory. Do not use \`-ngl 999\` for models exceeding physical memory; instead set \`-ngl\` to layers that fit in physical memory to prevent swap thrashing.
Journey Context:
macOS unified memory blurs VRAM/RAM boundary. Users often use \`--mlock\` thinking it prevents swapping, but on Mac it forces RAM residence and can cause kernel panics or OOM kills when exceeding physical RAM. Similarly, offloading all layers \(-ngl 999\) to "GPU" when the model exceeds physical memory causes system swap on the unified pool, killing performance. Better to leave some layers on CPU or reduce model size rather than relying on virtual memory.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:47:17.725043+00:00— report_created — created