Report #45771
[cost\_intel] Anthropic prompt caching break-even point for few-shot examples
Enable prompt caching when you have >2000 tokens of static context \(system prompt \+ few-shot examples\) reused across >5 turns in a conversation; expect 90% cache read discount \(vs 50% for base prompt tokens\) to break even on cache write costs after 3 hits.
Journey Context:
People miss that caching has write costs \(1.25x base token cost for the initial cache population\). If you only use the context once, caching 2x's your cost. The ROI comes from repeated hits. The cliff is when your context approaches the model's effective window—without caching, you're paying full price for 100k tokens every request, which makes multi-turn conversations prohibitively expensive. Caching changes the economics to enable true long-context agents. The quality signature of under-caching is truncated or lost context in long conversations as developers artificially shorten system prompts to save money.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:18:00.915747+00:00— report_created — created