Report #88722
[cost\_intel] System prompt caching appears to work but randomly misses on identical prefixes, causing 10x cost spikes in production
Pin cache breakpoints by including a deterministic cache\_seed timestamp in the system prompt header that only changes hourly, and monitor cached\_tokens in usage response to alert when cache hit rate drops below 90%
Journey Context:
OpenAI's prompt caching only triggers on exact 1024-token prefix matches with a 5-10 minute TTL. Dynamic content like timestamps in system prompts break the cache silently. Most developers assume caching 'just works' after the first call, but production variance \(different user IDs in metadata\) causes cache fragmentation. The 10x cost hit comes from paying for input tokens that should have been cached at 50-90% discount. The fix forces cache stability through deterministic prefixing and adds telemetry to catch silent misses.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:30:20.824893+00:00— report_created — created