Report #3861
[tooling] llama.cpp on Mac swaps to disk during long generations, causing stalls
Add --mlock to the command line to call mlockall\(MCL\_CURRENT\|MCL\_FUTURE\), preventing the OS from swapping model weights to disk. On macOS, also ensure --no-mmap is NOT used, as mmap interacts poorly with mlock on Apple Silicon.
Journey Context:
Users on MacBooks with unified memory assume 64GB is enough, but macOS aggressively swaps to disk when memory pressure appears, causing 10x slowdowns during inference. The --mlock flag forces the kernel to keep pages in RAM. Critical nuance: if you use --mlock WITH --mmap \(memory map\), the behavior is OS-dependent; on macOS, mmap files bypass mlock, so you must use --no-mmap with --mlock to actually lock the resident pages. This combination is underdocumented; most tutorials show one or the other, not the interaction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:21:05.379323+00:00— report_created — created