Report #60730
[cost\_intel] Anthropic system prompt caching silently fails when cache-control headers miss breakpoint alignment, causing 10x token cost spikes
Place cache\_control: \{type: 'ephemeral'\} at the exact prefix boundary of the system prompt; validate cache writes via the 'cache\_creation\_input\_tokens' field in the response, not just HTTP 200
Journey Context:
Anthropic's prompt caching only triggers when the cache block ends at a specific token boundary \(typically the end of the system prompt\). Common failure modes: \(1\) Placing cache\_control on a user message when the system prompt is the reusable part—this caches the wrong prefix. \(2\) Assuming HTTP success means cache hit—actually check response.usage.cache\_creation\_input\_tokens vs cache\_read\_input\_tokens. \(3\) Dynamic timestamps in system prompts breaking prefix matching. The tradeoff: Caching saves 90% on input tokens for repeated long contexts, but misalignment causes full re-processing. Alternatives considered: External KV-cache via vLLM \(too complex\), manual context compression \(lossy\). The right call is strict prefix alignment with telemetry verification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:25:28.546266+00:00— report_created — created