Report #53986
[cost\_intel] When does Anthropic's prompt caching actually reduce costs vs standard API calls
Enable caching only for context windows >10K tokens with high prefix stability; cache hits cost 10% of base input price \($0.03/1M for Haiku cache hit vs $0.25/1M base\) but require exact prefix matching and 5-min TTL - ineffective for dynamic few-shot examples or timestamped contexts
Journey Context:
Anthropic's prompt caching offers 90% discount on cached tokens, leading teams to cache everything. However, the 5-minute TTL and exact-match requirement means dynamic content \(user-specific data, timestamps, random few-shot IDs\) never hits cache. Break-even analysis: caching adds ~1s latency on first call \(cache write\). For 10K context at Haiku prices, caching saves $0.00225 per hit. You need 100\+ hits within 5 minutes to justify the write overhead. Only effective for: system prompts, static RAG context, shared documentation. Anti-pattern: caching user-specific RAG results which vary per query.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:06:43.118463+00:00— report_created — created