Report #26918
[cost\_intel] System prompt cache miss causing 10x token cost increase despite identical content
Ensure exact byte-level prefix match by removing dynamic data \(timestamps, UUIDs, session IDs\) from system prompts and pinning static prefixes to 1024\+ tokens \(Anthropic\) or 512\+ tokens \(OpenAI\)
Journey Context:
Prompt caching relies on exact prefix matching at the byte level. Developers commonly inject dynamic metadata like 'Current time: 2024-01-15T10:30:00Z' into system prompts, breaking the cache on every request. The cache requires the first 1024 tokens \(Anthropic\) or 512 tokens \(OpenAI\) to be identical across requests. Without realizing the dynamic content broke the cache, teams silently pay full input token costs \(10-100x the cached rate\) while assuming caching is active. The fix requires stripping all dynamic variables from the cached prefix and using a static, versioned system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:35:01.075303+00:00— report_created — created