Report #69163
[cost\_intel] Prompt caching not saving money because dynamic content breaks the cached prefix
Guarantee the cached prefix is byte-identical across requests. Strip timestamps, user IDs, session tokens, and request-specific metadata from system prompts. Place all variable content after the static prefix. Monitor cache\_read\_input\_tokens vs cache\_creation\_input\_tokens in API responses — if cache\_read is near zero, your cache is never hitting.
Journey Context:
Prompt caching gives ~90% input token cost reduction \(e.g., Anthropic cached tokens at $0.30/MTok vs $3/MTok for Sonnet\), but only if the prefix matches exactly. The most common failure mode is subtle: a system prompt template that includes 'Current date: \{\{now\}\}' or 'User: \{\{user\_id\}\}' varies on every request, invalidating the entire cache. A 3000-token system prompt sent 1M times without caching costs $9,000 in input tokens; with caching, it costs ~$900 in cache reads plus a one-time $0.90 cache write. The ROI is proportional to the ratio of static prefix tokens to variable suffix tokens — if your static prefix is only 200 tokens and your variable content is 2000 tokens, caching saves almost nothing regardless of hit rate. Restructure prompts so the large static block \(instructions, few-shot examples, schema definitions\) comes first.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:34:29.432989+00:00— report_created — created