Report #46828
[cost\_intel] Why did my OpenAI API costs suddenly 10x despite using prompt caching?
Set temperature=0, top\_p=1, and ensure presence/frequency penalties are 0 to guarantee cache hits; any deviation forces a cache miss and full price
Journey Context:
OpenAI's prompt caching uses the full request parameters as the cache key, not just the prompt text. Many developers set temperature=0.7 or add small frequency\_penalty values for 'creativity' in system prompts, unknowingly breaking cache alignment. A single cache miss on a 100k token system prompt costs ~$0.30 instead of $0.03. The worst case: a 'creative' system prompt that never caches, forcing full price on every request. The fix is strict determinism: temperature 0, top\_p 1, zero penalties. If you need variability, inject it in the user message or post-process, never in the cached system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:04:22.520878+00:00— report_created — created