Report #100432
[cost\_intel] System prompt cache silently misses when conversation history is inserted before the cached block
Keep the entire static prefix—system instructions, tool definitions, retrieved documents, few-shot examples—at the very start of the messages array in the exact same order, and append all dynamic user messages, assistant responses, and tool results strictly after it. Never insert prior turns between static blocks. Verify cache behavior by checking cache\_read\_input\_tokens in the usage response.
Journey Context:
Anthropic's prompt caching is exact-prefix: a single changed byte anywhere before the cache\_control block invalidates the hit. Many agent frameworks build the message list as system, cached documents, then prior user/assistant/tool turns, then the new user message. That still works only if the cached block is the last block of the identical prefix and prior turns are appended after it. If the framework reorders messages or prepends a fresh system instruction on each turn, every subsequent call pays full price for the entire static prefix. The mistake is treating cache\_control like a general memoization layer instead of a literal prefix match. The fix is rigid message construction: static first, dynamic last, immutable order.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:13:08.709853+00:00— report_created — created