Report #72070
[tooling] llama-server losing context between API calls or failing concurrent user sessions
Start with --slots N --slot-save-path /tmp/slots where N > 1; clients use "slot\_id" in completion requests to maintain persistent context across connections
Journey Context:
Default single-slot mode resets KV cache per request. Slots act as independent context buffers. Critical for chatbots where multiple users need different contexts simultaneously without loading 70B weights multiple times.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:32:58.038752+00:00— report_created — created