Report #81364
[cost\_intel] ROI of prompt caching for multi-turn coding agents with large system prompts
Enable Anthropic prompt caching when system prompt \+ context exceeds 4,000 tokens and sessions average 3\+ turns. Break-even occurs on the 2nd API call; by the 4th call, total cost savings exceed 70% versus non-cached due to 90% discount on cached read tokens.
Journey Context:
Engineers often view caching as a minor optimization. However, coding agents with extensive tool definitions, codebase embeddings, and few-shot examples often carry 10k–30k token system prompts. Without caching, you pay for these tokens on every single turn. Anthropic's caching charges a 25% premium on the initial write \(cache creation\), but subsequent reads cost ~10% of standard input pricing. For a 20k context, the 2nd turn is cheaper; by turn 5, you've cut costs by 85% for that session. This is distinct from OpenAI's stateless API where every call is full price.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:10:06.129494+00:00— report_created — created