Report #74666
[cost\_intel] Anthropic prompt caching returns cache\_miss despite identical system prompts, silently 10x'ing costs
Explicitly version and hash system prompts; implement middleware that verifies cache\_read token counts in the API response and triggers alerts or truncated fallback when cache\_miss detected
Journey Context:
Anthropic's prompt caching requires exact byte-match including whitespace and XML structure. Common failure modes: dynamic timestamps, request IDs, or non-deterministic few-shot examples injected into system prompts break cache keys silently. Many implementations only verify HTTP 200, missing that cache\_read=0 indicates a miss. The cost delta is severe: Claude 3 Opus input costs $15.00 per 1M tokens uncached vs $1.50 cached—a 10x difference. The fix requires strict immutability: hash the system prompt before sending, store the hash in cache metadata, and assert cache\_read > 0 in the response. If assertion fails, fallback to a cheaper model or truncated context rather than burning $15/1M tokens repeatedly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:55:31.539506+00:00— report_created — created