Report #43775
[cost\_intel] Anthropic prompt caching silently misses causing 10x cost spikes when static prefix drifts
Structure prompts with immutable cacheable prefix \(instructions, RAG docs\) followed by dynamic suffix; monitor cache\_read\_input\_tokens vs cache\_creation\_input\_tokens in usage metadata; if cache\_creation dominates, cache missed
Journey Context:
Anthropic's prompt caching \(Claude 3.5 Sonnet/Opus\) only caches when the prompt prefix matches exactly. Adding a dynamic timestamp, user ID, or session variable at the start of the system prompt invalidates the entire cache silently, causing every request to pay full input token cost \(cache\_creation\_input\_tokens\) instead of the 10x cheaper cache hit. Many systems inadvertently place volatile metadata in the system message. The fix is strict separation: static prefix \(cacheable via cache\_control: \{type: "ephemeral"\}\) vs dynamic suffix \(conversation history\). The usage field in the response shows cache\_read\_input\_tokens \(hits\) vs cache\_creation\_input\_tokens \(misses\). If you see high creation tokens repeatedly, your prefix is drifting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:56:56.156030+00:00— report_created — created