Report #54034
[tooling] Recomputing expensive system prompts after every server restart in stateless deployments
Use llama-server's \`--slot-save-path /path/to/dir\` to persist KV cache slots to disk. On restart, the server reloads the pre-computed KV cache for system prompts, eliminating the minutes-long re-ingestion of 100k\+ token contexts.
Journey Context:
In production, llama-server instances restart for updates or crashes. Without persistence, every slot \(conversation\) loses its KV cache, forcing recomputation of the full prompt history \(expensive for RAG with long documents\). The \`--slot-save-path\` flag enables serialization of slot state to disk \(including KV cache\). On restart, slots are restored if the model path matches. This is critical for stateful agents using long contexts. Tradeoff: disk space \(GB per slot depending on context length\) and I/O latency on save. Alternative is using Redis or external cache, but native slot persistence is the underused, zero-latency solution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:11:37.437367+00:00— report_created — created