Report #94516
[cost\_intel] OpenAI prompt caching not hitting despite identical system prompt causing 10x cost spike
Ensure the system prompt is >=1024 tokens AND identical from turn 1; any deviation in whitespace, message ordering, or earlier turns breaks the prefix match and voids the cache discount.
Journey Context:
Developers assume caching is automatic for any repeated prefix. OpenAI's caching \(as of 2024\) requires the identical prefix to be >1024 tokens. If you modify the system prompt slightly between calls, or if the cache was evicted \(happens after 5-10 min of inactivity\), you pay full input price. The trap is monitoring 'cache hit' metrics only at the application level without verifying the API-level header \`x-request-id\` correlates with cached pricing tiers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:13:49.548223+00:00— report_created — created