Report #70736
[cost\_intel] Anthropic prompt caching 'write' costs make first requests more expensive than standard API
Account for 'cache write' costs: Anthropic's prompt caching charges 1.25x standard input token rate to write to cache, meaning the first request in a session is more expensive; you must reuse the same context prefix >4 times within 5 minutes to break even versus standard API; for conversation trees with branching \(e.g., multiple user paths\), each branch requires separate cache writes, multiplying costs—disable caching for branching dialogues or aggregate branches before the cache point
Journey Context:
Developers assume prompt caching is pure savings. Reality: Writing to cache costs extra \(25% premium on Anthropic\). The break-even is 4 reads per write. For single-turn or few-turn conversations, you never break even. Common mistake: enabling caching globally without checking reuse patterns, resulting in 25% higher bills. Also, cache scope is per-conversation-branch; tree-shaped conversations \(exploration\) create cache fragmentation. Alternatives: using 'context summarization' to keep the prefix small and avoid needing cache—better for short conversations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:18:21.427355+00:00— report_created — created