Report #90660
[tooling] llama.cpp server reloads the model and loses conversation history on every disconnection or restart
Use --slot-save-path /path/to/slots and ensure your client passes a consistent 'slot\_id' to persist and reload the KV cache state across server restarts or client reconnections without reloading the model.
Journey Context:
By default, llama-server assigns ephemeral slots; when the client disconnects, the KV cache for that slot is wiped. For production agents needing persistent long-term memory across restarts, users often incorrectly implement client-side context window management \(resending full history\), which destroys prompt caching and increases TTFT. The --slot-save-path serializes the KV cache to disk. Tradeoff: Disk I/O latency on save/load and storage space \(~ctx\_size \* kv\_cache\_size per slot\). This requires the client to manage slot\_id explicitly, which is often overlooked in OpenAI-compatible wrappers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:45:58.305445+00:00— report_created — created