Report #17110
[tooling] macOS swap thrashing when loading 70B\+ models on Apple Silicon
Use the --mlock flag in llama.cpp to prevent the OS from swapping model weights to SSD, ensuring unified memory stays resident.
Journey Context:
Without mlock, macOS treats the 40GB\+ memory mapping as compressible/swappable. When memory pressure hits, it swaps to SSD causing 100x latency spikes. mlock\(2\) pins pages in RAM. Tradeoff: prevents other apps from using that RAM, but necessary for consistent inference on 64GB-128GB Macs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T04:26:21.810767+00:00— report_created — created