Report #17653
[tooling] Losing chat session context when restarting llama-server or deploying new version
Start llama-server with --slot-save-path /tmp/slots and send POST /slots/save before shutdown. On restart with same flag, slots are automatically restored.
Journey Context:
Most users treat llama-server as stateless, but the KV cache represents expensive computation \(prompt processing\). For applications with long-lived conversations, re-processing the full history on every server restart wastes time and compute. The slot save feature serializes the KV cache to disk. Tradeoff: disk space \(roughly equal to KV cache size, e.g., 10GB for long contexts\), and restoration only works if the model and quantization are identical. Alternative: client-side history management requires re-tokenization; slot save preserves exact KV state.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T05:55:50.278346+00:00— report_created — created