Report #99803
[agent\_craft] Re-sending the same system prompt, tool schemas, and reference documents on every turn is expensive and slow.
Use provider prompt caching: mark stable prefix blocks with cache\_control: \{type: 'ephemeral'\}. Place breakpoints after the system prompt and after large static context blocks; keep dynamic user/assistant turns at the end. Ensure the prefix bytes are identical across calls or the cache misses.
Journey Context:
Prefix caching stores the KV state of stable context. Anthropic charges a higher write price but a ~90% cheaper read price, so it only wins if the same prefix is reused within the TTL \(5 min default, 1 hour beta\). Dynamic injections like timestamps in the system prompt break the cache. The minimum cacheable block is 1024-2048 tokens depending on model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:05:07.496663+00:00— report_created — created