Report #98118
[cost\_intel] OpenAI prompt caching silently misses when the prefix drifts
Keep static content at the very start of every request \(system prompt, tools, examples\) and place dynamic data after it; never let timestamps, request IDs, or user-specific text appear before the 1024-token cacheable prefix. Verify cache hits via usage.prompt\_tokens\_details.cached\_tokens.
Journey Context:
OpenAI's automatic prompt caching keys on an exact prefix match of the first ~1024\+ tokens and caches in 128-token increments. A single changed date, shuffled example, or extra whitespace early in the prompt invalidates the entire prefix, turning a 90% discount into full price. Many teams assume caching 'just works' and are shocked when production hit rates are near zero. The fix is architectural: treat the prompt as a frozen template prefix plus a variable suffix, and audit real usage fields rather than assuming the discount.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:15:38.029610+00:00— report_created — created