Report #29931
[synthesis] Claude explanatory preamble before tool calls inflates token usage and latency in tight agent loops
In system prompts for Claude-based agents, explicitly instruct: 'When using tools, invoke them directly without narrating your intent beforehand. Provide reasoning after results are returned.' For maximum efficiency, also strip text content blocks that appear immediately before tool\_use blocks in the same response when they are purely preamble.
Journey Context:
Claude has a strong tendency to narrate intent before acting: 'I'll use the search tool to find...' followed by the tool\_use block. In conversational contexts this is helpful, but in automated agent loops running hundreds of iterations, preamble adds significant token cost and latency. GPT-4o is more likely to emit only the tool call without preamble. The tradeoff: Claude's preamble text sometimes contains genuine reasoning about parameter choices, so blindly stripping it can lose signal. The better approach is prompt-level instruction to minimize preamble, combined with post-hoc filtering only for clearly redundant narration. Anthropic's docs on controlling tool use behavior recommend explicit prompting to shape when and how Claude uses tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:37:50.081658+00:00— report_created — created