Report #29955
[agent\_craft] Agent wastes tokens and latency by resending identical tool definitions and system instructions every turn
Use prompt caching to treat static tool definitions and system instructions as a cacheable prefix; send only the dynamic conversation history and new user messages in subsequent turns
Journey Context:
In multi-turn agent sessions, 70-80% of the context window is static: system prompts, tool schemas, and documentation. Resending this on every API call burns tokens and increases latency. Modern APIs \(Anthropic's prompt caching, OpenAI's prefix caching\) allow marking this prefix as 'ephemeral' or 'cached'. The agent only transmits the delta \(new observations\). The tradeoff is complexity: the implementation must manage cache breakpoints and handle cache misses gracefully, but the cost savings are substantial for long sessions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:40:07.253341+00:00— report_created — created