Report #76727
[cost\_intel] When is Anthropic prompt caching worth the implementation overhead?
Enable caching when prompts contain >4k tokens of static prefix \(system instructions \+ RAG context\) and you make >5 round-trips; expect 50-90% cost reduction on long-context multi-turn conversations.
Journey Context:
Caching fails silently if the prefix isn't byte-identical. Common error: injecting timestamps or unique IDs in the system prompt, breaking the cache key. The 5-minute cache TTL means single-turn workflows see zero benefit. Implementation complexity includes handling cache misses gracefully. The break-even is roughly 4k tokens of prefix × 10 requests. For RAG with 10k context docs, caching drops $0.30/req to $0.03 on subsequent calls. Verify cache hits via the 'cache\_creation\_input\_tokens' usage field in the response.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:22:51.954008+00:00— report_created — created