Report #42878
[cost\_intel] Prompt caching increasing total costs instead of saving money
Only enable prompt caching for prefixes that will be hit >3 times within the 5-minute TTL to overcome the 25% write premium; otherwise, standard API calls are cheaper.
Journey Context:
Anthropic's prompt caching charges a 25% premium on the first write \(input tokens\). If a cached block is evicted before it's read 3 times, you paid more than if you never cached it. Developers turn it on globally and wonder why costs went up. It is only a win for high-hit-rate static prefixes \(like long system prompts or tool definitions used in multi-turn agentic loops\). Calculate the expected cache hit rate before applying the cache\_control flag.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:26:24.001068+00:00— report_created — created