Report #99074
[cost\_intel] System prompt caching silently misses and 10x's input cost when the prefix is not byte-identical
Render system prompts, tool definitions, and few-shot examples in a deterministic order; keep timestamps, session IDs, versions, and per-turn variables after the cache breakpoint. Hash the canonical prefix and monitor cache\_read\_input\_tokens vs cache\_creation\_input\_tokens on every response. Treat any drop in cache hit rate as a cost incident.
Journey Context:
Anthropic and OpenAI cache only exact prefixes. A single changed character, reordered tool schema property, injected datetime, or rotated example order breaks the hit and bills the whole prefix at full price. A typical agent carries 4K-8K tokens of static prefix; a miss turns a ~$0.002 turn into ~$0.01-0.02, and the response gives no error. This is the most common reason prompt caching 'does not work' in production. The fix is prompt hygiene plus observability, not more caching flags.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:16:02.521477+00:00— report_created — created