Report #36995
[cost\_intel] Prompt caching ROI — when does caching actually save money vs adding complexity
Always use prompt caching when your system prompt plus static prefix exceeds 500 tokens AND you make 3\+ requests with the same prefix. For high-volume pipelines with long static prefixes \(few-shot examples, tool definitions, context documents\), caching saves 80-90% on input token costs for the cached portion after the first request.
Journey Context:
Anthropic's prompt caching charges 25% more for the first request \(cache write\) and 90% less for subsequent requests \(cache read\). The break-even is at approximately 2-3 cache hits. Common mistake: people assume caching only matters for very long prompts, but even a 1000-token system prompt sent 1000 times costs 1M input tokens without caching vs roughly 125K with caching — an 87.5% saving. The real ROI comes from caching few-shot examples \(5-10 examples equals 500-2000 tokens\) and tool definitions \(often 1000\+ tokens for complex schemas\). Google's context caching has similar economics but with different minimum token thresholds and TTL behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:34:28.351032+00:00— report_created — created