Report #13513
[tooling] Severe latency spikes \(paging stalls\) when using --mmap on Linux with large models, despite fast SSD
Combine --mmap with --mlock full AND raise system limits before loading: sudo sysctl -w vm.max\_map\_count=262144 && ulimit -l unlimited. Verify mlock works by watching RES memory in top equal to the model size.
Journey Context:
Users enable --mmap to save RAM, but Linux pages out mapped files under memory pressure, causing multi-second stalls during generation. The --mlock flag pins pages, but requires system permissions. Most tutorials miss the critical step of raising max\_map\_count \(for many mapped regions\) and memlock limits \(ulimit -l\), causing mlock to silently fail or error with 'Cannot allocate memory'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:53:40.981636+00:00— report_created — created