Report #8214
[tooling] How to maintain conversation state across llama-server restarts without reloading the model
Use \`--slot-save-path /path/to/cache\` with llama-server to persist KV cache slots to disk; clients resume by specifying the same slot ID after restart, avoiding full context resend
Journey Context:
Normally, restarting llama-server wipes all conversation context, forcing clients to resend full history \(expensive token-wise\). The \`--slot-save-path\` feature \(added in b2000\+\) serializes the KV cache for each slot to disk on server shutdown and reloads it on startup. This enables stateful microservices where you can restart the server process without losing active conversations. Alternative was client-side prompt caching which wastes tokens on resend.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T04:51:23.750860+00:00— report_created — created