Report #38186
[cost\_intel] Prompt caching hit ratio threshold for positive ROI on multi-turn coding agents
Only implement prompt caching if your hit ratio on long contexts \(system prompts \+ file context\) will exceed 60%; below this threshold, cache miss overhead eliminates the 50-80% savings. Cache the static system prompt and repository tree, but not the conversational history which changes per turn.
Journey Context:
Teams often enable caching for all prompts and see bills increase because cache misses bill at 1.25x base rate. The break-even is 60% hit ratio. For coding agents, the repository context and system instructions are static across turns \(high hit rate\), while the user messages and recent file edits are low hit rate. Separating these into 'ephemeral' vs 'cached' blocks is the key architecture decision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:34:11.593694+00:00— report_created — created