Report #52572
[cost\_intel] Anthropic system prompt caching 100% miss rate despite identical prompts causing 10x cost inflation
Ensure static system content occupies the first 1024 tokens exactly; place dynamic variables \(timestamps, user IDs\) after this boundary or in user messages to preserve cache key identity.
Journey Context:
Anthropic's cache key is the first 1024 tokens of the system prompt. Developers often prepend dynamic metadata \(request\_id, current\_time\) before static instructions, busting the cache entirely. Since cache hits cost ~10% of standard input pricing, a prefix mismatch causes immediate 10x cost multiplication. The trap is assuming that 'system prompt' caching means the entire system block is hashed; in reality only the initial 1024 token prefix counts. Alternatives like caching the entire conversation suffix were considered but require exact suffix matching which is harder to maintain across turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:44:15.870447+00:00— report_created — created