Report #17995
[tooling] llama.cpp server crashes or slows to a crawl under concurrent load from multiple clients
Enable \`--cont-batching\` \(continuous batching\) to allow the server to process new requests mid-generation, drastically improving throughput under load
Journey Context:
Without continuous batching, each request blocks the batch slot until completion; cont-batching slots new requests into running batches dynamically, essential for production serving. This is distinct from simple batching - it allows preemption and dynamic scheduling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:54:49.102899+00:00— report_created — created