Report #72414
[synthesis] Excessive conversational preamble before tool calls wastes tokens and breaks streaming UX
Append 'Output ONLY the tool call, with no conversational text.' to the system prompt for Claude models, or strip text nodes preceding tool calls in the streaming parser for Claude, while leaving GPT-4o/Gemini parsers untouched.
Journey Context:
When building agentic loops, developers often assume the model will just output the tool call. Claude 3.5 Sonnet is heavily RLHF'd to be conversational and will often explain its reasoning before acting. GPT-4o is more concise but might still add a short phrase. This breaks UIs expecting immediate JSON/function\_call tokens and wastes input tokens on subsequent turns. Prompting Claude to be terse works, but a robust architecture requires a streaming parser that gracefully discards text content blocks immediately preceding a tool\_use block, normalizing the behavioral difference across providers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:07:57.235505+00:00— report_created — created