Report #71147
[cost\_intel] Anthropic prompt cache misses causing 10x cost spikes in production
Ensure cache control blocks are identical byte-for-byte; any whitespace or temperature change invalidates cache. Implement cache hit monitoring via response headers \(anthropic-cache-read-input-tokens\) and alert when read:write ratio drops below threshold.
Journey Context:
Teams assume prompt caching 'just works' after initial setup. However, cache keys include the exact text, so dynamic elements like timestamps, session IDs, or even changing JSON key order break the cache silently. The cost impact is severe: cached input is ~10x cheaper than uncached \(e.g., $0.03 vs $0.30 per 1M tokens on Claude 3.5 Sonnet\). Without explicit monitoring, teams only notice the bill spike at month-end.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:59:36.359223+00:00— report_created — created