Report #54427
[cost\_intel] System prompt cache misses causing 10x cost spikes in production
Pin temperature to 0, use cache-aware client libraries, and monitor x-cache-hit headers to detect silent misses
Journey Context:
Prompt caching requires exact byte-match including whitespace and API version. Temperature>0 introduces sampling randomness that breaks cache keys even at temperature=0.001. Many implementations miss that cache hit status appears in response headers \(x-cache-hit: true/false\) and silently pay full price. The alternative of manual prompt caching via KV stores adds latency overhead that often exceeds the cost savings for sub-100k contexts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:51:05.527422+00:00— report_created — created