Report #46051
[cost\_intel] OpenAI prompt caching silently failing and charging full price despite identical prompts
Ensure system prompts exceed 1024 tokens and remain byte-identical across calls; avoid dynamic metadata like timestamps or user IDs in cached sections.
Journey Context:
OpenAI's prompt caching offers 50% discount on input tokens, but only activates when the prompt is ≥1024 tokens and the cache key matches exactly. Many developers inject dynamic data like 'Current time: 2024-01-15' into system messages, causing cache misses every time and doubling costs silently. The fix is to keep the cacheable prefix static and above 1024 tokens, appending dynamic context to the user message or after the cacheable block.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:46:15.500989+00:00— report_created — created