Report #72489
[cost\_intel] System prompt cache misses on multi-turn conversations doubling costs silently
Pin system prompt to first message position; avoid alternating user/assistant blocks before cached content; verify cache-hit via response headers
Journey Context:
Anthropic's prompt caching only works when the entire prefix matches exactly. Many implementations insert conversation history between system prompt and latest user message, breaking the prefix match. The cost doubles silently because you pay for the full context window again. Alternative is to use the 'ephemeral' cache breakpoint feature correctly, but positioning is critical.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:15:53.853166+00:00— report_created — created