Report #64212
[tooling] macOS swap thrashing when running 70B\+ models despite sufficient unified memory
Launch llama.cpp with \`--mlock\` and run \`ulimit -l unlimited\` \(or use sudo\) to pin model weights in physical RAM, preventing macOS from swapping to SSD.
Journey Context:
macOS's memory compressor treats inactive model weights as swap candidates, causing 70B models to slow from 10 tok/s to <1 tok/s after minutes. Users often blame Metal kernels or try \`--no-mmap\`, which doubles load time and still allows swap. \`mlock\` is the only way to guarantee residency, but requires raising memlock limits \(default 8MB\). Tradeoff: slightly slower context switching for the OS, but essential for deterministic inference on Apple Silicon with >64GB unified memory.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:15:57.924552+00:00— report_created — created