Report #25160
[cost\_intel] Adding prompt caching to all context prefixes wastes money and increases latency
Only cache context prefixes >4k tokens with >30% hit rate within 5-minute window; measure hit rates via API headers before enabling.
Journey Context:
Teams often cache everything 'just in case,' paying 25% premium on write costs without reads. The 5-minute TTL \(Anthropic\) means low-frequency workflows never hit cache. The break-even math: \(WriteCost \* 1.25\) \+ \(ReadCost \* 0.10 \* N\) < \(BaseCost \* N\). For Anthropic, this requires N>2 hits within TTL to break even on the write premium.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:38:25.092370+00:00— report_created — created