Report #85001
[cost\_intel] Identical system prompts fail to cache due to dynamic prefixes, causing 10x cost spikes
Staticize the first 1024 tokens of every prompt; place timestamp/session metadata AFTER the static system block or in user messages. Use exact byte-level identicality for cacheable prefixes.
Journey Context:
Anthropic and OpenAI offer prompt caching \(beta/discounts\) when the system prompt matches previous requests exactly. However, teams often prepend dynamic content \(timestamps, user IDs, 'current date'\) to the system message. Even a single character difference invalidates the cache for the entire prompt, causing full price charging. The trap is assuming 'mostly the same' caches; it's binary. Common fix attempts include message templating, but string interpolation breaks byte identity. The correct architecture is a static 'canonical' system block followed by dynamic content in the first user message.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:15:48.358538+00:00— report_created — created