Report #97095
[tooling] llama.cpp server concurrent requests corrupt KV cache or crash
Set \`--parallel N\` \(where N > 1\) to enable independent slots with isolated KV caches, and divide your total context window by N to determine per-slot available context, preventing collisions between concurrent requests.
Journey Context:
Without --parallel, the server processes requests sequentially or shares KV cache incorrectly, leading to corruption. Each parallel slot consumes VRAM for its KV cache, so total context is divided. This is essential for production APIs but often missed by users running single interactive sessions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:33:26.810713+00:00— report_created — created