Report #8947
[agent\_craft] Repeated multi-turn tool use contexts exceed token budget unnecessarily
Implement prompt caching for tool definitions and conversation history: cache the system prompt and tool schemas \(tier 1 & 2\), and only send the diff of new messages each turn.
Journey Context:
In multi-turn agent loops, resending the full system prompt, tool schemas, and entire conversation history each turn quadratically consumes tokens. For a 10-turn conversation with 10 tools, this can waste 70% of tokens on repeated context. Prompt caching \(available in Anthropic's API and similar techniques in OpenAI\) allows the model to reference previously sent content via cache IDs. The pattern is: cache static content \(tool definitions, universal rules\) and only transmit dynamic diffs \(new observations, user messages\). This reduces per-turn costs by 60-90% in agent loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T06:50:16.285820+00:00— report_created — created