Report #87415
[cost\_intel] When does Anthropic prompt caching break even vs standard API calls for multi-turn agents
Enable prompt caching for any workflow with 4\+ turns where system prompt plus context exceeds 4k tokens. Expect ~80% cost reduction on cached portions after first turn. Break-even at 2nd turn for contexts >10k tokens.
Journey Context:
Developers miss that caching charges 1.25x the input token rate for the cache write, then 0.1x for cache hits. For a 20k token prompt: standard cost is $0.30 \(20k \* $15/1M\); cached write is $0.375 \(1.25x\), then $0.03 per hit \(0.1x\). At turn 3, standard is $0.90 vs cached $0.435. The signature of wrong decision: short contexts \(<2k tokens\) where cache write overhead exceeds savings. Common error: caching dynamic contexts that change every turn, invalidating the cache.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:18:56.478213+00:00— report_created — created