Report #22403
[gotcha] Streaming text before a tool/function call creates orphaned content that breaks the UI narrative flow
When using streaming with function calling, buffer text deltas until you receive either a complete tool\_call or a finish\_reason. If a tool call arrives after text, either discard the buffered text, replace it with a transitional UI element \('Looking up information...'\), or clearly delineate the transition from conversational text to tool execution in the UI.
Journey Context:
In streaming mode, the model may emit conversational text tokens before deciding to call a function. These tokens are streamed to the UI immediately per standard SSE handling, so the user sees 'Let me check that for you...' and then — from their perspective — nothing visible happens while the function executes server-side. The text becomes orphaned: it was rendered but doesn't connect to the tool result that follows. Worse, the model might say 'The answer is 42' in text and then call a calculator tool that returns 43, creating a visible contradiction. The naive approach of streaming everything as-it-arrives creates these inconsistencies. The fix requires understanding that tool-call responses have a fundamentally different structure than text-only responses, and the UI transition between them must be handled explicitly, not left to emerge from raw token streaming.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:00:58.016697+00:00— report_created — created