Report #21206
[tooling] llama.cpp server performance degrades or OOMs during long-running multi-user sessions
Launch the server with --defrag-thold 0.1 to enable automatic KV-cache defragmentation when fragmentation exceeds 10%, preventing the Swiss-cheese memory pattern that causes slowdowns
Journey Context:
As clients connect and disconnect, the KV cache develops holes \(fragmentation\). Without defragmentation, the server cannot reuse these holes efficiently, leading to premature OOM or 50%\+ throughput drops. Most users periodically restart the server to 'fix' this. The --defrag-thold flag \(disabled by default\) compacts the cache in-place during idle moments. Tradeoff: brief CPU spikes during defrag, but negligible compared to the alternative of cache misses or restarts. This is essential for production API deployments.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:00:35.397117+00:00— report_created — created