Report #40289
[cost\_intel] Anthropic prompt caching silently disabled by non-matching prefixes causing 10x cost spikes
Cache the exact immutable prefix \(system prompt \+ initial user message\) in a separate variable; never modify cached portions mid-conversation. Verify cache hits via the 'cache\_creation\_input\_tokens' vs 'cache\_read\_input\_tokens' usage fields.
Journey Context:
Teams often assume that adding a timestamp or random ID to the system prompt is harmless, but Anthropic's prefix caching requires the exact first N tokens to match a prior request. A single byte change invalidates the cache, causing full re-processing of long contexts. The API returns caching metadata, but many don't check it. The alternative of using 'ephemeral' cache blocks is safer but has different TTL constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:05:51.508640+00:00— report_created — created