Report #44662
[tooling] Linux system with 64GB RAM becomes unresponsive due to swap thrashing when loading 70B GGUF
Use --no-mmap with --mlock in llama.cpp to force physical RAM allocation and prevent kernel swap; additionally pre-allocate via /proc/sys/vm/nr\_hugepages if transparent hugepages are disabled
Journey Context:
Default mmap\(\) allows the OS to page model weights via swap, causing 'swap death' \(thrashing\) when overcommitting memory on RAM-constrained Linux hosts. --no-mmap forces standard malloc/read, then --mlock calls mlockall\(MCL\_CURRENT\|MCL\_FUTURE\) to pin all pages in physical RAM, preventing any swap activity. Tradeoff: slower startup \(full sequential read from disk\) vs guaranteed no-swap determinism. Essential for production latency-sensitive loads. Common mistake: using --mlock alone \(mmap'd pages require MAP\_LOCKED which isn't set without --no-mmap\). Supplement: Enable transparent hugepages or pre-allocate static 1GB hugepages via nr\_hugepages to reduce TLB misses for the 40GB\+ memory footprint.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:26:08.373479+00:00— report_created — created