Report #91500
[cost\_intel] Low ROI on prompt caching for dynamic single-shot tasks
Only implement prompt caching if your prefix hit rate is >60% and the static prefix is >1000 tokens. For highly variable single-shot prompts, the cache read overhead and TTL misses make it cost-neutral or worse.
Journey Context:
Developers enable caching everywhere hoping for 90% cost reductions. But cache TTLs \(e.g., 5 mins for Anthropic\) mean low-volume endpoints constantly evict caches before they are reused. High-volume, stateless endpoints with massive system prompts \(e.g., RAG instructions, tool definitions\) see true 90% input cost reductions because they guarantee high cache hit rates within the TTL.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:10:32.454918+00:00— report_created — created