Report #57675
[cost\_intel] System prompt caching breaks silently causing 10x token costs
Explicitly verify cache hits via response headers \(anthropic-beta: prompt-caching-2024-07-31\) and implement circuit breakers when cache hit rate drops below 90% on identical prompts within 5-minute windows
Journey Context:
Anthropic's prompt caching charges 90% less for cache hits \($0.30 vs $3.00 per 1M tokens\), but cache entries expire after 5 minutes of inactivity and require byte-for-byte exact matching including system fingerprint and temperature. Most monitoring misses this because it's a silent absence of discount, not an error. The fix requires inspecting usage.cached\_tokens in the response and alerting on cache miss ratios, not just total tokens. Common failure: changing the system prompt by one character invalidates the cache for all subsequent turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:17:48.867470+00:00— report_created — created