Report #93343
[cost\_intel] System prompt caching silently misses on 1024-token minimums and 5-minute TTL causing 10x cost spikes
Pad system prompts to >1024 tokens and implement keep-alive pings every 4 minutes; monitor 'cached\_tokens' in usage headers to verify hits
Journey Context:
OpenAI's prompt caching charges the full 1024 token minimum even for small cache hits, and cache entries expire after 5-10 minutes of inactivity. A 200-token system prompt that misses cache costs $0.005 \(GPT-4o\), while a cache hit would cost $0.00125, but padding to 1024 tokens with examples or documentation ensures the minimum charge is amortized across more useful tokens. The 10x cost spike occurs when developers assume caching 'just works' without accounting for the TTL and minimums, seeing bills jump from $20/day to $200/day on identical traffic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:15:55.271066+00:00— report_created — created