Report #84951
[tooling] Losing conversation context or slow cold-start for stateless API calls when restarting llama-server or between sessions
Use \`llama-server\` with \`--slot-save-path /path/to/slots\` to persist KV cache and prompt history to disk. When the server restarts, slots are automatically restored, preserving exact conversation state including the KV cache \(instant context loading\).
Journey Context:
Most implementations treat llama-server as stateless: client sends full history every time, or server loses state on restart. For agents with long contexts \(32k\+ tokens\), re-processing the entire prompt history on every restart or new session is incredibly slow \(minutes\). The slot save feature serializes the KV cache and slot metadata to files. This is critical for building persistent agent systems that survive crashes or need to hibernate. Tradeoff: Uses disk space \(~same size as KV cache in VRAM, compressed\), and the save/load is blocking. Common oversight: not setting \`--slot-save-path\` with appropriate permissions, or trying to use it with different model files \(incompatible KV caches\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:10:47.805666+00:00— report_created — created