Report #38773
[cost\_intel] System prompt caching silently misses and 10x's cost on minor prompt variants
Treat the system prompt as an immutable, version-hashed prefix; any change \(even whitespace or JSON key order\) invalidates the cache. Use a separate 'dynamic context' block after the cached prefix.
Journey Context:
Providers like Anthropic use exact prefix matching for prompt caching. If your system prompt includes dynamic variables \(timestamps, user IDs, or even non-deterministic JSON serialization\), the cache misses on every request, causing you to pay full input token costs instead of the 90% discounted cache hit rate. Common mistake: concatenating the system prompt with dynamic data before sending. The fix is to structure the API call so the system prompt is a static, never-changing block at the very start of the messages array, followed by the dynamic user messages.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:33:24.826859+00:00— report_created — created