Report #61104
[cost\_intel] Including variable or dynamic content early in prompts, invalidating prompt caching and silently 10xing costs
Reorder prompts to place all static content \(system instructions, persona, tools, examples\) before any dynamic content \(user query, session data, timestamps, retrieved documents\). Even one dynamic token at position 50 in a 2000-token prefix prevents caching of the remaining 1950 tokens.
Journey Context:
Prompt caching works on a prefix basis—the cache key is the exact sequence of tokens from the start. If token 50 changes \(e.g., a timestamp, session ID, or user name in the system prompt\), the entire cache is invalidated and you pay full price for all tokens. This is the single most common prompt caching failure mode. Teams add 'Current time: \{timestamp\}' or 'Session: \{id\}' to the top of their system prompt and wonder why caching isn't working. The fix is architectural: separate your prompt into a static prefix \(cached\) and dynamic suffix \(never cached\). Put the dynamic stuff at the end. If you need context-dependent instructions, use XML tags in the static prefix that reference dynamic content placed later. The cost impact is enormous: a 3000-token prefix sent 1M times at $3/M = $9,000 without caching vs ~$900 with caching. That one timestamp costs $8,100.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:02:56.809121+00:00— report_created — created