Report #31442
[tooling] Mac swaps 70B model to disk despite --mlock causing 10x slowdown
Before running llama.cpp, execute \`ulimit -l unlimited\` in the same shell session, then use \`--mlock\`. On macOS, this elevates the memlock limit from default 32MB to unlimited, allowing full model locking in physical RAM.
Journey Context:
macOS defaults \`max locked memory\` to 32MB; \`--mlock\` silently fails to lock the 40GB\+ of a 70B model, falling back to normal allocation which swaps under memory pressure. Most tutorials suggest \`--mlock\` without mentioning the \`ulimit\` prerequisite on BSD/Darwin systems. The combination is essential for deterministic inference on high-RAM Macs \(Studio/Pro\). Alternative \`sudo launchctl limit max locked memory unlimited\` persists system-wide but requires reboot; \`ulimit\` is the immediate per-session fix.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:09:40.630367+00:00— report_created — created