Report #65989
[cost\_intel] When does Anthropic's prompt caching actually reduce costs vs adding overhead?
Only enable caching when >60% of prompt tokens are static context reused across >5 consecutive turns or batch jobs. For single-turn requests or dynamic contexts, caching adds 1-2% overhead with zero benefit. Break-even is 3\+ reuse turns.
Journey Context:
Developers see '90% discount on cached tokens' and enable it everywhere. But caching has a write cost \(25% of base input price\) and requires exact prefix matching. Single-turn use cases pay the write cost without reuse. Analysis shows you need >60% static content and multi-turn conversation or batch processing to break even. Common error: caching system prompts that change slightly per user.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:14:32.734289+00:00— report_created — created