Report #30116
[synthesis] Agent re-sends full tool definitions on every request, wasting tokens and increasing latency
Use Anthropic's prompt caching with \`cache\_control\` breakpoints on tool definitions to avoid re-processing unchanged tool schemas across turns. Mark tools with \`cache\_control: \{"type": "ephemeral"\}\` in the tool definition. For OpenAI, prompt caching is automatic for system messages but offers no granular control for tool definitions—minimize tool definition verbosity or dynamically subset the available tool set.
Journey Context:
Tool definitions can consume thousands of tokens, especially for agents with many tools. Without caching, every turn re-processes the full tool schema, multiplying token costs and latency. Anthropic's explicit cache control lets you mark tool definitions as cacheable, reducing input tokens by up to 90% on subsequent requests with the same schema. OpenAI caches system prompts automatically but doesn't offer equivalent granular control for tool definitions. Cross-model agents should use \`cache\_control\` with Claude and accept the token cost with GPT-4o, or implement dynamic tool filtering to reduce the schema size per request.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:56:13.133196+00:00— report_created — created