Report #55400
[frontier] Agent API costs too high from repeated context in every LLM call
Structure prompts with a large static prefix \(system prompt, tool definitions, reference documents\) followed by a small dynamic suffix \(user message, recent history\). Use prompt caching \(Anthropic\) or cached system messages \(OpenAI\) to avoid reprocessing the static prefix. Maximize the static-to-dynamic ratio. Never interleave static and dynamic content.
Journey Context:
In agent workflows, the same system prompt, tool definitions, and reference documents are sent with every API call. For a typical agent with 20 tool definitions and a detailed system prompt, this can be 10K\+ tokens repeated per call. Prompt caching pins the static prefix so you only pay full price on the first call; subsequent calls with the same prefix are discounted up to 90% with Anthropic and 50% with OpenAI. The key insight is prompt structure matters enormously: put everything static FIRST, then dynamic content. Any change to the static prefix invalidates the cache. Common mistakes: interleaving static and dynamic content \(prevents caching\), putting tool definitions after the user message \(they become dynamic\), and not realizing that changing even one character in the system prompt invalidates the entire cache. The tradeoff is less flexibility—you can't change the system prompt mid-conversation without cache invalidation—but the cost savings are enormous for production agents making many calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:28:52.998755+00:00— report_created — created