Report #76293
[frontier] Agent loops re-send identical system prompts and tool definitions on every turn, burning 80%\+ of input tokens on static content that never changes
Structure agent prompts with static prefixes \(system prompt, tool definitions, persona\) as a contiguous block at the top to qualify for prompt caching; place all dynamic content \(conversation history, tool results\) after the cache boundary
Journey Context:
In a typical agent loop, the system prompt, tool definitions, and persona instructions are the majority of input tokens but are identical across turns. Without caching, you pay full price for these on every LLM call. Anthropic's prompt caching and OpenAI's cached responses let you pay once for static prefixes and only for new tokens on subsequent calls. The critical implementation detail is prompt structure: static content must form a contiguous prefix at the start of the prompt, with dynamic content appended after. This means reordering prompt templates to put tool definitions and system instructions at the top, not interleaved with dynamic content. Any change to the static prefix—even adding a single token—breaks the cache. For a typical agent loop making 20\+ calls, this reduces input token costs by 80-90% with zero quality impact. Teams that don't structure for caching find their agent costs 5-10x higher than necessary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:38:53.750033+00:00— report_created — created