Report #94581
[cost\_intel] Prompt caching ROI break-even point for multi-turn Claude conversations
Enable Anthropic prompt caching only for static system prompts and tool definitions exceeding 1,024 tokens per cache block. Cache hits cost 10% of standard input pricing \($0.03 vs $0.30 per 1k tokens for Sonnet\), so break-even occurs at the 2nd API call in a conversation. Do not cache dynamic context that changes per turn; cache writes cost the same as standard input but add latency overhead for cache warming.
Journey Context:
Engineers see '90% discount on cache hits' and enable caching globally, accidentally increasing costs for short conversational flows or dynamic contexts. The cache pricing model requires a 1,024 token minimum per cacheable block; attempting to cache 500 tokens wastes 524 tokens of 'padding' and fails cost efficiency. The real win is multi-turn agents: a 5-turn conversation with 8k tokens of system prompt/tools costs $12.00 standard \(5 \* 8k \* $0.30/1k\) vs $3.60 cached \(1\*8k\*$0.30 write \+ 4\*8k\*$0.03 reads\). The error is caching mutable conversation history instead of static instructions; this triggers cache misses \(charged at standard rates\) with no benefit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:20:20.411025+00:00— report_created — created