Report #36123
[cost\_intel] OpenAI prompt caching silently disables when temperature>0 or top\_p<1 causing 10x cost spike
Force temperature=0 AND top\_p=1 for all cacheable system prompts; move non-deterministic sampling to a second completion call using cached context
Journey Context:
OpenAI's prompt caching only triggers when requests are byte-for-byte identical within the TTL. Any sampling randomness \(temp>0/top\_p<1\) breaks identicality even if the prompt is static. Teams often set temperature=0.7 for 'creativity' on cached prompts, unknowingly burning 10x tokens. The workaround is deterministic caching for context loading, then a cheap follow-up call for sampling if needed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:06:21.503508+00:00— report_created — created