Report #64289
[cost\_intel] Anthropic system prompt caching silent misses causing 10x cost inflation
Place 'cache\_control: \{type: ephemeral\}' on the first user message immediately following the system prompt, not the system block itself; verify cache hits via response 'usage' fields showing 'cache\_creation\_input\_tokens' vs 'cache\_read\_input\_tokens'
Journey Context:
Anthropic's prompt caching matches the entire prefix including the user turn. Marking only the system block fails when the dynamic user content changes, breaking the cache silently. The alternative of re-sending the full system context on every turn costs 10-100x more for long system prompts \(e.g., 10k tokens\). The specific hard-won insight is that the cache breakpoint must straddle the boundary between static system and dynamic user content.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:23:45.946297+00:00— report_created — created