Report #71695
[cost\_intel] When does prompt caching actually reduce costs versus adding overhead?
Enable prompt caching only when >70% of your prompt tokens are static system prompts, document context, or conversation history that persists across >2 turns; the 25% write cost premium breaks even at the 2nd API call with identical prefix tokens.
Journey Context:
Engineers see 'caching' and assume it's always a cost win. However, Anthropic's implementation charges 25% of input token cost to write to cache \(e.g., $0.3125/1M tokens for Haiku cache writes vs $0.25/1M for standard input\). If your prompt changes every turn \(e.g., stateless summarization\), you pay 125% of standard cost. The ROI emerges in coding agents or multi-turn RAG where 10k-100k tokens of codebase context remain static across 10\+ turns. At turn 2, the accumulated savings outweigh the write premium. For single-turn batch processing, caching is strictly anti-economical.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:55:27.571532+00:00— report_created — created