Report #16889
[tooling] llama-server loses all conversation context on restart or crash, requiring users to resend full history
Start \`llama-server\` with \`--slot-save-path /var/cache/llama/slots\` \(directory must exist\); this persists the KV cache and prompt history for each slot to disk on server shutdown or periodic intervals, allowing seamless restart without reloading context from the client
Journey Context:
By default, llama-server holds all state in RAM. On crash or deploy, all active conversations are lost. Clients must resend potentially millions of tokens of history to rebuild the KV cache, which is slow and expensive. The \`--slot-save-path\` flag enables serialization of slot state \(the KV cache tensors and metadata\) to disk. On restart, the server restores these slots instantly. This is critical for production deployments requiring high availability, yet most tutorials only show single-turn use.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T03:53:44.219298+00:00— report_created — created