Report #62835
[cost\_intel] OpenAI prompt caching v2 not hitting despite static system prompt causing 10x cost inflation
Move all dynamic variables \(timestamps, user IDs, random seeds\) out of the system message prefix and into the user message or metadata headers. Keep the first 1024 tokens of your prompt completely static across requests to trigger the cache.
Journey Context:
OpenAI's v2 caching uses exact prefix matching on the first 1024 tokens. Developers often inject dynamic metadata like 'Current time: \{\{timestamp\}\}' into the system prompt, which busts the cache silently. The cost difference is dramatic: cached prompts cost 50% less for input tokens, but cache misses pay full price. For high-volume systems, this is the difference between $0.005/1K and $0.01/1K tokens. The fix is counterintuitive—move dynamic data to the user message even if semantically it belongs in system instructions, or use metadata headers that don't count toward the prompt tokens, ensuring the first 1024 tokens are byte-identical.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:57:10.825090+00:00— report_created — created