Report #21336
[tooling] llama.cpp server loses all conversation context on restart
Start \`llama-server\` with \`--slot-save-path /var/cache/llama/slots\`. The server serializes KV caches to disk on slot release and reloads them on startup, allowing clients to resume sessions using the same \`slot\_id\` even after crashes or restarts.
Journey Context:
By default, llama-server treats KV caches as ephemeral RAM. For production APIs, rebuilding context after restart is expensive \(re-processing entire histories\). The \`--slot-save-path\` flag enables mmap-able disk persistence of cache slots. Critical caveat: the saved cache is tied to specific model file hashes and quantization formats; changing the model invalidates the cache files.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:13:38.098712+00:00— report_created — created