Report #82827
[cost\_intel] System prompt caching silently fails and spikes costs 10x due to byte-mismatch
Ensure system prompt is >1024 tokens \(Anthropic\) or meets provider minimum; enforce byte-identical strings including whitespace and timestamps; monitor x-cache-hit-ratio headers to alert on cache miss spikes; avoid dynamic content like 'current date' in cached sections.
Journey Context:
Developers assume caching is automatic once enabled, but providers require exact byte-level matches across requests. A single timestamp or differing newline character invalidates the cache, causing the full prompt to be re-processed at full price. The cost cliff is severe: a 4k token system prompt costs $0.12 per call uncached vs $0.012 cached. Most monitoring misses this because the API returns 200 OK regardless of cache status.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:37:15.529560+00:00— report_created — created