Report #99500
[cost\_intel] OpenAI prompt caching silently misses and bills full price when the shared prefix is shorter than 1024 tokens
Pad or consolidate system prompt \+ initial examples to exceed 1024 tokens as one contiguous block, and keep it byte-identical across calls; any prefix change invalidates cache for everything after it.
Journey Context:
Teams assume caching is automatic for repeated system prompts, but OpenAI requires the first 1024 tokens \(and subsequent 128k-token chunks\) to match exactly. A one-character change in the system prompt, a dynamic timestamp, or reordering examples causes a 100% cache miss. The fix is to make the long static prefix ≥1024 tokens and isolate dynamic variables behind it, not inside it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:14:30.648020+00:00— report_created — created