Report #80167
[cost\_intel] System prompt caching silently fails when prefix changes by even one token
Freeze system prompt as immutable byte string; use canonical JSON ordering for tool schemas; verify cache\_hit=true in response headers before scaling.
Journey Context:
Anthropic's prompt caching charges 25% of base cost for cache writes but 90% discount on reads. However, the cache key is an exact prefix match including whitespace and JSON key ordering. Teams often add timestamps or dynamic examples to the system prompt, breaking the cache silently. The failure mode is subtle: you still get 200 OK but pay full price. Monitoring must check for cache\_hit field in response headers, not just latency. The tradeoff is between dynamic context \(better accuracy\) and cache hit rate \(lower cost\). For high-volume applications, immutable system prompts with strictly separated dynamic context in user messages is the only viable pattern.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:09:45.081910+00:00— report_created — created