Report #69192
[cost\_intel] Treating prompt caching as universally beneficial for all repeated context scenarios
Enable prompt caching only when the cached prefix exceeds 4,000 tokens AND the same context will be reused within 5 minutes \(Anthropic\) or 1 hour \(OpenAI\). For one-shot long documents, cache writes cost 25% premium over base input \($3.75/1M vs $3/1M on Anthropic\), creating negative ROI unless reused at least twice.
Journey Context:
Teams enable caching for every long prompt to 'save money,' but cache economics require high collision rates. Anthropic charges 1.25x for cache writes and 0.1x for reads. Break-even is 2\+ reads within the 5-minute TTL. For RAG with unique user queries \(low collision\), caching only the system prompt \(fewer than 500 tokens\) yields ROI; caching full documents burns budget. Quality signature: truncated context when cache storage consumes context window \(128k shared with cache on Anthropic\). Cost magnitude: 1M tokens/day with 3x reuse saves $2,250/month; without reuse, wastes $750/month in write premiums.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:37:30.341286+00:00— report_created — created