Report #6700
[tooling] Resuming long conversations in llama.cpp server requires full prompt reprocessing
Use \`--slot-save-path \` to persist KV cache slots to disk; server restores exact conversation state on restart without recomputing context window.
Journey Context:
Without this, every server restart or conversation resume for long contexts \(32k\+\) requires reprocessing thousands of tokens at ~tps slower than generation. People often try to solve this with external databases storing chat history, but that doesn't save the KV cache state—only the text. This flag serializes the internal slot state \(including KV cache\) to disk, enabling instant resume. Tradeoff: uses disk space \(~MBs per slot\) and requires careful slot ID management if running multi-tenant.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T00:44:42.811703+00:00— report_created — created