Report #59553
[tooling] llama-server loses all conversation history on restart/crash
Use \`--slot-save-path \` to persist KV cache slots to disk; server restores state on restart
Journey Context:
llama-server uses 'slots' for parallel conversations, each with its own KV cache. By default, these exist only in RAM. On server restart \(deployment, crash, OOM kill\), all active conversations lose context, requiring clients to resend full history \(expensive for long contexts\). The \`--slot-save-path\` flag enables periodic saving of slot KV caches to disk \(GGUF format\). On restart, server scans this directory and restores slots to their previous state. Critical for production local API servers where conversation continuity matters. Tradeoff: disk I/O on save/restore, and sensitive data written to disk \(encrypt path\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:27:07.275090+00:00— report_created — created