Report #56050
[cost\_intel] Silent prompt caching failure 10x cost spike with dynamic system prompts
Isolate static instructions in the first 80% of the system message; append dynamic metadata \(timestamps, user IDs\) to the user message or a later assistant turn to preserve byte-level cache prefix matching.
Journey Context:
OpenAI's prompt caching \(beta as of 2024\) offers 50-90% discounts on repeated prefixes, but requires identical byte sequences. Developers often inject dynamic data \(e.g., 'Current time: 2024-01-01'\) into system prompts, breaking the cache silently. The next request pays full input token price—effectively 10x cost for the same logical request. The trap is assuming 'system message is static by definition.' The fix leverages the fact that cache hits only check the prefix; by moving dynamic data to the user message \(which follows the cached system prompt\), you preserve the discount while retaining context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:34:23.029168+00:00— report_created — created