Report #74918
[tooling] Losing all conversation state and KV cache when llama.cpp server restarts
Use \`--slot-save-path /var/cache/llama-slots\` to persist slot state \(KV cache and prompt history\) to disk, enabling seamless recovery after crashes or restarts
Journey Context:
By default, all context is lost on restart, forcing clients to resend entire conversation histories, wasting tokens and time. This flag serializes the KV cache and slot metadata to disk on slot release or periodically. On restart, the server reloads these files into the appropriate slots. This is distinct from simple prompt caching—it preserves the entire model state. Tradeoff is disk space \(equal to KV cache size per slot\) and I/O latency on save.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:21:09.311977+00:00— report_created — created