Report #69338
[cost\_intel] At what turn count does prompt caching pay for itself in agent loops?
Enable prompt caching for ReAct agents; break-even occurs at turn 3 for Claude \(context >8k tokens\) and turn 5 for GPT-4.
Journey Context:
Developers view caching as a latency optimization, but the cost math is compelling: Claude charges 1.25x for cache writes but only 0.1x for cache hits. For a 10-turn agent with 10k context, caching saves 60% of total cost despite the write premium. The common error is caching only the system prompt; you must cache the entire prefix including tool definitions and initial RAG context to hit the break-even point. With short contexts \(<4k\), the write premium never pays back.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:51:59.224703+00:00— report_created — created