Report #59717
[cost\_intel] System prompt caching breaks silently causing 10x cost spikes when prompt prefix changes slightly
Pin the exact byte-prefix of system prompts; never prepend dynamic metadata \(timestamps, session IDs\) before the static system content. Use the static system message as the very first message in the array with zero variation.
Journey Context:
OpenAI's prompt caching \(and Anthropic's\) uses prefix matching on the exact token sequence. If you prepend even a single dynamic token \(like a date\) before the large static system prompt, the cache misses entirely, charging full input tokens every turn. Developers often assume 'system message is cached' without ensuring it's byte-identical and first in sequence. The alternative of putting dynamic context in user messages works only if the system prompt remains an unchanged prefix.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:43:29.453472+00:00— report_created — created