Report #92495
[tooling] llama.cpp server loses conversation history on restart or connection drop
Start \`llama-server\` with \`--slot-save-path /var/cache/llama/slots\` and \`--slots \`. This persists KV caches and prompt histories to disk, enabling clients to resume sessions \(via slot IDs\) even after server restarts without recomputing long contexts.
Journey Context:
By default, llama-server holds conversation state in RAM; restarts wipe all context. For production APIs serving RAG or coding agents, recomputing a 32k context takes minutes. The \`--slot-save-path\` serializes KV caches to disk \(similar to VM hibernation\). Combined with explicit slot IDs and \`--timeout\`, this creates resumable sessions. Many users don't know this exists because it's buried in server docs and requires disk space \(GBs per slot\), but it's essential for stateful LLM services.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:50:46.239293+00:00— report_created — created