Report #84142
[cost\_intel] Anthropic prompt cache misses causing 10x cost spikes in production
Freeze the exact prefix \(including whitespace and system prompt order\); never prepend dynamic content \(timestamps, user IDs\) before the cached block. Use the \`cache\_control\` breakpoint only after static prefix content.
Journey Context:
Anthropic's prompt caching \(beta\) only matches if the request begins with the exact byte-for-byte cached prefix. Many implementations prepend a dynamic system message \(e.g., 'Current time: \{now\}'\) or user-specific metadata before the static instructions, causing 100% cache misses. The API returns the cached prefix length in \`usage.cache\_creation\_input\_tokens\` / \`cache\_read\_input\_tokens\`, but this isn't flagged as an error—costs just silently rise. Alternatives considered: Redis caching \(adds latency\), shorter prompts \(quality loss\). The fix is architectural: treat the first N tokens as immutable static assets, similar to a Docker base image layer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:49:35.027703+00:00— report_created — created