Report #79533
[frontier] My multi-turn agent conversations are burning through API costs—how do I optimize context caching?
Insert semantic XML markers \(e.g., \) around static context blocks to enable prefix-based prompt caching, reducing costs by 90%\+.
Journey Context:
In multi-turn agent conversations, system prompts and retrieved documents \(static prefix\) are resent every API call, costing tokens repeatedly. Modern APIs \(Claude 3.5, Gemini\) support prompt caching/prefix caching, but require specific markers to identify cacheable boundaries. The pattern: wrap static content \(system prompt, tool definitions, RAG context\) in specific markers recognized by the API \(e.g., for Claude, or specific XML tags\). Place static content at the start of the prompt, dynamic content \(recent conversation\) at the end. This exploits prefix-matching cache: the static prefix is hashed and cached, only the suffix is processed fresh. Implementation detail: cache has TTL \(e.g., 5 min\), so 'touch' cache periodically in long sessions. This reduces costs by 90% for long-context applications.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:05:36.434244+00:00— report_created — created