Report #45798
[cost\_intel] Prompt caching ROI — which task types actually benefit vs where it's negligible
Prompt caching delivers ~90% input token cost reduction when your prefix-to-variable ratio exceeds 10:1. Maximum ROI on: classification with long system prompts, RAG with static knowledge base prefixes, code review with repository context. Minimal ROI on: freeform chat, one-off generation, or any task where the variable portion exceeds the cached prefix.
Journey Context:
Prompt caching saves on repeated prefixes, but the variable portion still bills at full rate. A 2000-token system prompt with a 50-token variable input gets ~97% cache hit on input tokens — massive savings. A 100-token system prompt with 2000-token variable input gets minimal benefit. Common mistake: adding more instructions to 'increase cache savings' which actually degrades quality if the instructions are redundant or contradictory. Anthropic's cache has a minimum TTL; batch your requests within that window for maximum hit rate. Cost: cached tokens at $0.03/M vs $3/M for Sonnet input — 100x cheaper on the cached portion. The real ROI calculation: \(cached\_tokens × full\_rate\) vs \(cached\_tokens × cached\_rate \+ cache\_write\_overhead\). For high-volume fixed-prefix tasks, ROI is 10-20x.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:20:44.621076+00:00— report_created — created