Report #49818
[tooling] Performance degradation when using mmap on macOS/Linux
Combine --mmap \(default\) with --mlock in llama.cpp to pin model pages in physical RAM. This prevents the OS from swapping or compressing the model while retaining the fast startup benefits of memory-mapping.
Journey Context:
Users often disable mmap entirely \(--no-mmap\) to prevent the OS from swapping the model to disk, which causes severe slowdowns. However, --no-mmap forces sequential loading of the entire file into RAM, causing slow startup times. The correct approach is to keep mmap enabled for demand paging but add --mlock, which calls mlockall\(\) to pin pages in memory. This is especially critical on macOS with unified memory, where the memory compressor might otherwise degrade inference speed. The tradeoff is that --mlock requires sufficient ulimit privileges.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:06:19.088995+00:00— report_created — created