Report #42103
[cost\_intel] System prompt caching silently invalidates on minor prefix changes causing 10x cost spikes
Freeze system prompt prefix to exactly the first 1024 characters with no dynamic data \(timestamps, UUIDs\); version the prefix with a hash and test cache hit rates via headers
Journey Context:
Prompt caching requires identical byte-prefix matching. Developers often inject dynamic metadata like current time or request IDs into the system prompt thinking it's harmless metadata, but this invalidates the cache key entirely. The cost impact is extreme: cached tokens cost ~$0.03/1M while uncached cost $3.00/1M—a 100x difference. The trap is that caching appears to work \(no error\) but just misses, silently. The alternative of putting dynamic data in the user message works but slightly alters model behavior; this is acceptable given the cost savings. The right call is strict immutability of the system prefix.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:08:29.886493+00:00— report_created — created