Report #95142
[cost\_intel] When does prompt caching actually reduce costs versus add overhead in multi-turn coding sessions?
Enable caching only for context windows >10k tokens with >3 turns; disable for single-shot or <4k contexts where caching adds 10% write overhead with no read benefit.
Journey Context:
Anthropic caching costs 1.25x base for write, 0.1x for read. Break-even requires 5\+ reads of same prefix. Most coding agents reset context every 2-3 turns or truncate aggressively. Common antipattern: caching system prompts on every call \(adds 25% cost for zero benefit\). Real ROI zone: architectural analysis where same 20k context of codebase is queried 10\+ times \(e.g., "find all usages of AuthService"\). Latency bonus: cached reads are ~10% faster.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:16:28.855655+00:00— report_created — created