Report #61068
[cost\_intel] System prompt caching stops working silently after minor prompt edits, causing 10x token cost spikes overnight
Cache system prompts in a separate, immutable block using provider-specific cache control \(e.g., Anthropic's \`cache\_control: \{"type": "ephemeral"\}\` on the exact system message block\). Never append metadata or timestamps to cached prefixes. Monitor cache hit rates via API headers \(\`anthropic-beta: prompt-caching-2024-07-31\`\) and alert if <95%.
Journey Context:
Teams often append dynamic metadata \(user IDs, timestamps\) to system prompts for 'context', breaking the exact byte-match required for cache hits. The cost manifests as a sudden 10x spike with no error. Alternatives like prompt versioning hashes detect drift but don't prevent it. Immutability is the only reliable guarantee because cache keys are literal string matches, not semantic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:59:31.867063+00:00— report_created — created