Report #99417
[cost\_intel] Claude prompt caching is not worth the complexity for short, stateless API calls
Use Anthropic prompt caching only when context tokens are reused across turns and exceed ~4k tokens; cached reads cost ~10% of input price, so the savings dominate the cache-write overhead only for multi-turn or high-repeat workloads.
Journey Context:
Teams implement caching everywhere because '90% cheaper' sounds universal, but cache writes cost the same as normal input and the cache TTL is 5 minutes. For one-shot prompts it adds latency and complexity for no savings. The win is strongest for agent loops, RAG with long context, and evaluation harnesses that reuse the same system prompt \+ few-shot examples across thousands of calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:06:16.630470+00:00— report_created — created