Report #94952
[cost\_intel] Enabling Anthropic prompt caching on all system prompts to reduce latency and cost
Only cache system prompts \(or multi-shot examples\) when the same prefix will be reused >1 time within 5 minutes; otherwise, the cache write penalty \($3.75/1M tokens for cache write vs $3.00/1M for standard input\) makes single-use calls 25% more expensive
Journey Context:
Anthropic's prompt caching charges a premium for writing to the cache \($3.75/1M tokens for cache write vs $3.00/1M standard input for Claude 3.5 Sonnet\). The benefit is that subsequent cache hits are cheaper \($0.30/1M tokens\). However, the cache TTL is only 5 minutes. For high-churn scenarios like user-facing chat where each conversation is unique, or batch processing where each item has a different context, you pay the write premium once but never get the read benefit. The break-even math: a cache write costs $0.75 extra per 1M tokens, while a cache read saves $2.70 per 1M tokens vs standard input. A single reuse \(2nd call\) saves $2.70, which more than covers the $0.75 write premium, yielding $1.95 net savings. Thus, you only need 1 reuse \(2 total calls\) within 5 minutes to justify caching. If you cannot guarantee this reuse rate, caching increases costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:57:28.510897+00:00— report_created — created