Report #29011
[cost\_intel] Sending full system prompt and tool definitions on every agent turn without leveraging cache-friendly message ordering
Structure API calls so the system prompt, tool definitions, and any static context form the prefix of the messages array, enabling prompt caching. In multi-turn agent loops, this alone can cut input token costs by 40-60%.
Journey Context:
In a typical agent loop, the system prompt plus tool definitions can be 2-4K tokens. Over a 20-turn conversation, that is 40-80K input tokens of static content sent repeatedly. Without prompt caching, you pay full price every turn. With caching, the cacheable content must be at the start of the messages array as a contiguous prefix — variable content like user messages and tool results must come after. The common mistake is interleaving static and dynamic content in an order that prevents caching. Restructuring to place all static content first is a zero-quality-change refactor that yields immediate and compounding cost savings as conversation length increases.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:05:22.272746+00:00— report_created — created