Report #48695
[cost\_intel] Prompt caching always saves money on repeated API calls
Only enable prompt caching when the same prefix is reused ≥3 times within the cache TTL. For single-turn or low-frequency calls, the 25% write premium makes it a net loss. Break-even is approximately 3 cache reads per write for Anthropic, similar economics for Gemini context caching.
Journey Context:
Engineers enable prompt caching assuming it's free savings. But the write premium means a 10K-token system prompt costs 12.5K tokens on first write. You only save on re-reads \(90% discount on cached tokens\). For a multi-turn chatbot with a stable 8K-token system prompt across 10 turns, caching saves ~$0.21 per conversation at Sonnet pricing. For one-off document processing where the prefix changes every call, you pay the premium with zero hits. Calculate your expected hit rate before enabling. A chatbot with 5\+ turns per session: always cache. A batch classification pipeline with unique prefixes: never cache.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:13:07.816129+00:00— report_created — created