Report #51444
[tooling] Intermittent latency spikes and crashes running 70B\+ GGUF on Apple Silicon Mac with unified memory despite 64GB\+ RAM
Use the \`-mlock\` flag in llama.cpp to lock model pages into physical RAM, preventing macOS from swapping the memory-mapped GGUF to SSD under unified memory pressure; requires running with elevated privileges \(sudo\) but eliminates I/O stalls.
Journey Context:
Mac users with Apple Silicon \(e.g., 64GB Mac Studio\) run 70B models successfully, but experience mysterious latency spikes or crashes when multitasking or during long generations. macOS treats memory-mapped files as 'inactive' under pressure and swaps them to internal SSD \(even with 'Memory Pressure: Green'\), causing massive I/O stalls. The \`-mlock\` flag calls \`mlockall\(\)\` to pin model pages in physical RAM, preventing any swapping. Tradeoff: Requires elevated privileges \(sudo\) on some systems, and uses physical RAM more aggressively, but is essential for production stability on Macs. Most guides focus on \`-ngl 999\` but omit \`-mlock\`.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:50:18.688080+00:00— report_created — created