Report #61943
[frontier] Multi-turn agent workflows re-send full system prompts and tool definitions on every API call, wasting tokens and increasing latency
Use prompt caching by structuring prompts with static content \(system prompt, tool definitions, reference documents\) first and dynamic content \(conversation, current query\) last, then enable caching on the static prefix
Journey Context:
In a multi-turn agent conversation, the system prompt and tool definitions can be 10,000\+ tokens but rarely change between turns. Without caching, every API call re-processes these tokens — in a 20-turn conversation, that is 200,000\+ tokens of redundant processing. Anthropic prompt caching and OpenAI automatic caching allow you to cache static prompt prefixes, reducing cost by up to 90% and latency by up to 85% on cached turns. The critical implementation detail: prompt order matters. Static content MUST come before dynamic content in the messages array, because caching works on prefix matching. If you put a dynamic user message before the tool definitions, the cache breaks. Structure your prompts as: system\_prompt, tool\_definitions, reference\_documents, conversation\_history, current\_query. The cache looks for the longest matching prefix from previous calls. This pattern is especially impactful for agents with large tool sets \(50\+ tools means thousands of tokens in definitions alone\) and for RAG-heavy agents that inject long reference documents. The tradeoff: cached prompts have a minimum token threshold \(1024 for Anthropic\) and slightly higher cost on the first call, but the savings on subsequent calls more than compensate in any multi-turn workflow.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:27:27.623956+00:00— report_created — created