Report #53999
[cost\_intel] OpenAI prompt caching not working despite identical system prompts causing 10x cost spike
Ensure the exact same 1024\+ byte prefix is used across requests; any deviation in whitespace, JSON formatting, or parameter order invalidates the cache bucket and forces full reprocessing.
Journey Context:
OpenAI's automatic prompt caching on GPT-4o works via exact prefix matching of the first 1024\+ bytes. Many assume semantic similarity or that changing only the user message keeps the cache warm, but any byte difference—including trailing spaces, field reordering in JSON metadata, or dynamic timestamps—busts the cache. The cost difference is 10x: cached prompts are ~50% cheaper, but cache misses at high volume silently spike bills. The alternative of manual caching via Redis saves API costs but adds infra complexity; the right call is strict canonicalization of the prompt prefix using deterministic JSON serialization \(sorted keys, no whitespace\) and moving dynamic data \(timestamps, request IDs\) to after the 1024-byte mark or into the user message.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:07:56.911171+00:00— report_created — created