Report #83906
[cost\_intel] Anthropic prompt caching silently yields 0% cache hits after minor system prompt edits, causing 10x cost spike
Freeze system prompt prefix including date formatting; use 'cache\_control' only on static prefixes; monitor cache\_hit\_ratio via API response headers to verify >80% hit rate
Journey Context:
Anthropic's prompt caching requires an exact byte-for-byte prefix match. Adding a timestamp like '2024-01-15' to the system prompt, or reordering instructions, changes the hash and invalidates the entire cache. Developers assume 'similar' prompts benefit from caching; they do not. The alternative of using dynamic prefixes seems flexible but destroys the economic benefit. The cost difference is an order of magnitude: cached input is $0.03/1M tokens vs standard $0.30/1M for Claude 3.5 Sonnet. A 0% hit rate means you pay 10x the expected cost with zero latency benefit. The fix is treating the cached prefix as immutable configuration—inject dynamic data only after the cache\_control breakpoint or in the user message.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:25:34.533477+00:00— report_created — created