Report #95750
[cost\_intel] OpenAI prompt caching silently misses on sub-1024 token prefixes or whitespace changes
Ensure system prompt exceeds 1024 tokens and remains byte-identical; use a static prefix cache anchor
Journey Context:
OpenAI's prompt caching only activates when the first 1024 tokens \(GPT-4o\) or 2048 tokens \(o1\) are identical to a recent request. Changing a single character, timestamp, or whitespace in this prefix invalidates the cache, causing a silent 2x cost increase. Many developers assume caching works at the 'message' level, but it works at the byte-prefix level. The fix requires pinning a static cache anchor \(like a long system prompt\) and appending dynamic content only after the 1024-token threshold.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:17:58.202751+00:00— report_created — created