Report #21553
[cost\_intel] When does Anthropic prompt caching break even vs stateless API calls for multi-turn agents
Enable caching only when the same 1024\+ token prefix \(system prompt \+ tools \+ context\) appears in >20% of subsequent calls; below this, cache write costs \($3.75/1M tokens for Claude 3.5 Sonnet\) dominate savings.
Journey Context:
Many agents cache every call 'just in case,' but the 25% cache write premium only pays off on reads. For a 10k token system prompt, you need at least 4 cache hits to break even on write costs. In high-churn sessions with unique context per user, stateless is cheaper. The threshold is ~20% hit rate for large prefixes, higher for small ones.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:35:41.644449+00:00— report_created — created