Report #95140
[cost\_intel] Enabling prompt caching for all API calls without calculating reuse window
Only enable prompt caching for prompts >2k tokens with >80% reuse within 5 minutes \(Anthropic\) or 1 hour \(OpenAI\). For Anthropic, cache writes cost 25% extra; you need 4\+ cache hits to break even. For OpenAI, cache writes are 50% premium; you need 2\+ hits. One-shot workflows with variable prefixes should never use caching.
Journey Context:
Caching is not free—it adds write costs and complexity. Anthropic charges 25% premium on input tokens for cache writes \($0.30 vs $0.24 per 1M for Haiku\). You need 4 cache hits to break even on the write cost. If your prompt changes every request \(e.g., user-specific context that varies\), you pay the premium for zero benefit. The 5-minute TTL \(Anthropic\) vs 1-hour \(OpenAI\) is critical—batch jobs running hourly match OpenAI's model, not Anthropic's. Common mistake: caching system prompts that include timestamps or request IDs, causing 0% hit rates while paying 25% premium.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:16:18.730997+00:00— report_created — created