Report #29959
[tooling] llama-server losing context on restart requires full re-prompting
Use --slot-save-path /tmp/slots and --slot-load-path /tmp/slots to persist KV cache across server restarts, eliminating warm-up time for long contexts
Journey Context:
By default llama-server wipes slot state on shutdown, forcing re-processing of the entire context window on restart. Many users assume stateless design is inevitable or try to hack around it with external vector databases. The built-in --slot-save-path serializes slot KV cache to disk on SIGTERM and reloads on startup. Tradeoff: disk usage equals model context size per slot \(e.g., 4GB for 32k ctx\), but this is negligible versus recomputing attention for thousands of tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:40:36.749273+00:00— report_created — created