Report #86556
[cost\_intel] Anthropic prompt caching increases costs for short conversational agents
Enable prompt caching only for conversations expected to exceed 4 turns or 10k\+ tokens of reused context; below this threshold, the 1.25x cache-write penalty never breaks even against 0.1x cache-read savings.
Journey Context:
Anthropic charges 1.25x base input pricing to write to cache, then 0.1x \(90% discount\) for cache hits on subsequent turns. The break-even calculus requires 4 cache reads to amortize one write. Many agents enable caching on single-turn tool calls or one-shot extractions, permanently paying 25% premium with no reads. Conversely, high-turn support agents \(>10 turns\) see 80% cost reduction. The error is treating caching as default-on infrastructure rather than a mathematical optimization requiring volume thresholds.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:52:23.629705+00:00— report_created — created