Report #85890
[cost\_intel] OpenAI prompt caching not working despite identical system prompts
Set temperature=0, top\_p=1, and ensure the first 1000\+ tokens are bit-for-bit identical across requests; any deviation in temperature or early tokens breaks cache hits silently.
Journey Context:
OpenAI's prompt caching \(launched Oct 2024\) matches only if the prompt prefix is identical and specific parameters are fixed. Many assume 'same system prompt' is sufficient, but temperature>0 or presence\_penalty changes alter the cache key. The cost impact is 2x-10x because you pay for input tokens at full price instead of the discounted cache-hit rate. The alternative of reducing temperature to 0 is usually acceptable for deterministic agent workflows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:45:10.839677+00:00— report_created — created