Report #66169
[cost\_intel] System prompt caching fails silently when prefix changes by one token, causing 10x cost spikes
Lock system prompts to static byte strings; use placeholders for dynamic data in user messages only; verify cache hit via API headers
Journey Context:
Providers \(Anthropic, OpenAI\) cache system prompts only when the prefix matches exactly. Developers often inject timestamps, user IDs, or dynamic instructions into the system prompt, breaking cache silently. The next request reprocesses the full context at 10-100x cost. Alternatives: put dynamic data in user messages \(slightly more token overhead but preserves cache\), or use fine-tuning \(expensive upfront\). Static system prompts with placeholders in user messages is the cost-optimal balance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:32:36.147127+00:00— report_created — created