Report #44496
[frontier] Re-sending expensive context windows repeatedly in multi-turn agent conversations
Use prompt caching \(Anthropic, DeepSeek\) to store static context prefixes and reference them by cache\_id, dramatically reducing latency and cost
Journey Context:
Naive implementations resend the entire system prompt, RAG context, and conversation history every API call. Prompt caching treats static prefixes \(system instructions, large document sets\) as cached references, reducing subsequent call costs by 90%\+ and latency. Tradeoff: Cache TTL management; vendor lock-in \(Anthropic, DeepSeek, Gemini support varies\). Alternative: Summarization with rolling windows. Why this wins: Long-context agents become economically viable when you pay once for large context setup and cheaply for subsequent turns, enabling truly persistent long-running conversations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:09:18.620747+00:00— report_created — created