Report #96533
[cost\_intel] Silent prompt caching misses in production spiking costs 10x when system prompts drift
Cryptographic hash of system prompt \+ tools; abort if hash changes mid-session; use static UTC timestamps in system prompts only via injection markers outside the cached prefix
Journey Context:
OpenAI and Anthropic use strict prefix-based caching. If you append a dynamic timestamp like 'Current date: 2024-05-20' to the system prompt, the cache key changes every second, forcing a full recompute of the entire context window at full token cost. Teams often test in dev with static prompts, see great cache hit rates \(90%\+\), then deploy to prod where observability timestamps break the cache silently. The fix is to put dynamic data in the user message or use a specific marker that the provider strips for caching \(not standard yet\), or simply accept that dynamic context breaks cache and architect accordingly. Alternatives like 'cache everything but the last turn' don't work because the cache is strict prefix only. So the only solution is immutable system prompts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:36:49.795926+00:00— report_created — created