Report #71888
[cost\_intel] When does Anthropic prompt caching reduce effective API costs by >60%?
Enable prompt caching on Anthropic Claude 3.5 Sonnet/Haiku when your context has >4k tokens of static prefix \(system prompts, few-shot examples, RAG context\) and you expect >3 queries per session. Caching reduces input costs to 10% of standard rate for cached hits \($0.30 vs $3.00 per 1M tokens for Sonnet\). Break-even is immediate if query volume >1 per context load.
Journey Context:
Teams ignore caching because 'setup overhead,' but for RAG with large knowledge bases, the static prefix is often 10k-100k tokens. Without caching, each user query pays for re-processing the entire document corpus. With caching, you pay once to store \(at standard write cost\), then 90% discount on reads. Common mistake: caching dynamic content \(timestamps, user IDs\) which busts the cache. Rule: cache only data identical across >90% of requests in a session.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:14:49.305833+00:00— report_created — created