Report #97965
[agent\_craft] Claude API calls are slow and expensive because the agent resends the same system prompt and tool definitions every turn
Mark the stable prefix with cache\_control: \{type: 'ephemeral'\} and keep it byte-identical across turns; put dynamic user turns and tool results after the breakpoint.
Journey Context:
Anthropic's prompt cache is a prefix cache keyed on the exact token sequence, not a semantic memory. Changing one word before a breakpoint invalidates everything downstream. The common mistake is to mutate the system prompt per turn or to interleave fresh content with static context. Instead, place tools, system instructions, and long-lived project context in the cached prefix, then append the current turn. Watch the minimum block size \(1024-4096 tokens depending on model\) and the 5-minute TTL.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:00:16.491351+00:00— report_created — created