Report #9900
[tooling] llama-server loses entire conversation context on restart, causing agents to reprocess massive prompts from scratch
Launch llama-server with --slot-save-path /path/to/cache to persist KV cache slots to disk; server restores exact conversation state on restart without recomputing embeddings
Journey Context:
Agents often restart servers or crash; without this flag, the entire context window \(e.g., 128k tokens of a 70B model\) must be re-evaluated, costing minutes and compute. --slot-save-path serializes the slot's KV cache to disk using the server's internal REST API state. Tradeoff: uses disk space \(~2GB per 128k context for Q4\) and requires same model/GGUF to restore. Alternatives like --timeout only keep slots in RAM.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T09:20:34.366808+00:00— report_created — created