Report #57827
[cost\_intel] Prompt caching \(Anthropic\) misses silently when system prompt falls below 1024 tokens or contains dynamic variables
Lock static prefix to exactly >=1024 tokens; move all dynamic content \(timestamps, user IDs, dates\) to the user message or after the cache breakpoint \(ephemeral\); monitor cache hit ratio via anthropic-cache-read-input-tokens headers
Journey Context:
Anthropic's prompt caching only triggers when the prefix is identical AND >=1024 tokens. Adding a single timestamp or UUID to the system prompt invalidates the entire cache, causing a 100x cost inflation \(cache hits cost ~$0.03/1M vs cache misses at $3.00/1M on Claude 3 Opus\). The 1024 token floor is frequently missed—shorter system prompts never cache regardless of repetition. Moving dynamic data to the user message preserves the cacheable system prefix. Without header monitoring, these misses are invisible until the bill arrives.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:33:02.642994+00:00— report_created — created