Report #59411
[cost\_intel] When prompt caching beats context window truncation in agentic loops
Enable prompt caching \(Anthropic\) or context caching \(Gemini\) for agent tool-use loops where system prompt \+ tools \+ conversation history exceeds 8k tokens. Cache hits reduce input costs by 90% on subsequent turns vs. re-sending full context. Break even at 3\+ turns in same session.
Journey Context:
Developers send full conversation history repeatedly in ReAct-style loops, paying for static system prompts and tool definitions on every API call. Prompt caching stores the prefix \(system \+ tools \+ history\) server-side; subsequent calls only bill for new tokens \+ 10% of cached context. For a 20k token agent context, standard pricing is $0.30/turn \(Sonnet 3.5 input\); with caching it's $0.03 \+ $0.03 = $0.06/turn after first turn. The trap: caching only works if you maintain the cache ID; stateless retries lose the benefit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:12:40.853942+00:00— report_created — created