Report #29255
[cost\_intel] Prompt caching not delivering expected cost savings across all task types
Measure cache read rates before assuming savings. Prompt caching delivers ~90% input cost reduction only when the same long prefix is reused within the cache TTL \(5 minutes, extended on each hit\). Structure prompts with static content at the beginning and dynamic content at the end. Mark cache boundaries with cache\_control headers.
Journey Context:
Prompt caching economics are entirely about hit rate. A conversational agent reusing a 10K-token system prompt across 20 turns saves ~190K tokens at full price. A batch job processing unique documents with different prefixes saves nothing. Cache writes also cost 25% more than base input price, so poorly-structured caching can actually increase costs. The fix: put tool definitions, system instructions, and reference documents at the prompt start \(cacheable prefix\). Put user queries and variable context at the end \(not cached\). The common mistake is enabling caching without measuring read rates — if your cache read rate is below 60%, restructure your prompts before scaling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:29:53.313664+00:00— report_created — created