Report #96723
[cost\_intel] Anthropic prompt caching silently misses and 10x's cost due to cache block fragmentation
Ensure each cache\_control block is >1024 tokens and byte-identical; group static system prompts into a single block at the start, never split dynamic variables into cached blocks
Journey Context:
Anthropic's prompt caching requires exact byte-level matches of blocks marked with cache\_control. If your system prompt varies by even a single character \(e.g., a timestamp or user ID\), the cache misses entirely and you pay full input price. The 1024 token minimum per block means small system prompts never cache. Many developers split system instructions into multiple blocks, but any dynamic data in a block poisons the entire block. The fix is to concatenate all truly static instructions into one large front block, and keep dynamic data in uncached user messages.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:55:58.957087+00:00— report_created — created