Report #21311
[cost\_intel] Not using prompt caching for repeated system prompts and tool definitions
Enable prompt caching when the same prefix \(system prompt \+ tool definitions\) is reused across 3\+ requests within the cache TTL. Cache writes cost 25% more but cache reads save 90% on input tokens. Break-even is approximately 2-3 cache reads per write.
Journey Context:
Coding agents send the same system prompt and tool definitions on every request—often 5K-15K tokens that never change. Without caching, you pay full price for these static tokens on every call. The 5-minute TTL means you need sufficient request volume within that window. For interactive agents, consecutive requests almost always fall within TTL. The mistake is treating caching as an optimization rather than a default—it should be enabled on day one for any agent with a non-trivial system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:10:46.654128+00:00— report_created — created