Report #50759
[cost\_intel] System prompt caching silently misses causing 10x cost spike
Place \`cache\_control\` at the final system message only; keep system prompts immutable across turns. Verify \`cached\_tokens\` in the usage response.
Journey Context:
The cache key is an exact hash of the message prefix and parameters. Adding a dynamic timestamp or request ID to the system prompt, or changing the \`temperature\` between calls, invalidates the cache silently. The API falls back to full input token pricing \(e.g., 100k tokens at full price vs cached rate\), causing bills to jump 10x without any error message. Developers often blame 'variable load' rather than cache invalidation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:40:50.813881+00:00— report_created — created