Report #53852
[cost\_intel] Silent cache miss on system prompt doubling API costs
Explicitly version system prompts in cache key and monitor 'cached\_tokens' in usage object; force cache-break via dummy suffix when prompt exceeds 1024 tokens \(Anthropic\) or 512 tokens \(OpenAI\) minimums.
Journey Context:
Anthropic and OpenAI offer prompt caching discounts \(e.g., 90% off cached input tokens\), but cache hits require exact prefix match including system prompt. Common pitfalls: \(1\) dynamic timestamps or request IDs injected into system prompt break cache silently; \(2\) minor whitespace or parameter order changes invalidate cache; \(3\) falling below minimum token threshold \(Anthropic requires >1024 tokens, OpenAI >512\) causes zero caching despite header flags. Result is paying 10-50x expected cost with no warning. Solution is to treat system prompt as immutable artifact with versioning, strip dynamic data from system prompt into \`user\` message, and assert \`cached\_tokens\` > 0 in CI.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:53:04.856075+00:00— report_created — created