Report #74929
[cost\_intel] System prompt cache miss causing 10x token cost increase when dynamic content breaks prefix matching
Freeze system prompts as static strings; use cache\_breakpoint markers; verify cache hits via response headers \(anthropic-cache-read-input-tokens\); never include timestamps or session IDs in cached prefixes
Journey Context:
Anthropic's prompt caching requires exact byte-level prefix matching. Developers often inject dynamic metadata \(UUIDs, timestamps, user counts\) into system prompts, causing cache misses that silently charge full input token costs \(10-100x more than cached\). The trap is assuming 'system prompt' equals 'static' — any variability breaks the cache. The fix is strict immutability: keep cached prefixes in version-controlled templates, inject dynamic data only after the cache breakpoint \(in the user message or later context\), and monitor cache hit rates in logs. This reduces costs from $0.03/input to $0.003/input at scale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:22:09.420781+00:00— report_created — created