Report #88647
[gotcha] Streaming breaks when the model invokes a tool mid-response
Never stream text tokens to the UI until you have confirmed the model is not going to invoke a tool. Buffer the stream and inspect for tool\_call delta tokens first. If a tool call is detected after text has already been streamed, suppress the already-rendered text, replace it with a 'Using tool…' state, and resume streaming after the tool result returns. Use stream\_options to get early signals.
Journey Context:
A common failure: you start streaming text to the user, then the model decides partway through that it needs to call a function. The UI has already rendered partial text that now needs to be replaced or contextualized. This creates a jarring 'rewriting history' effect that destroys trust. Teams try to patch already-rendered content by appending the tool result inline, but this creates an incoherent narrative. The fix feels counter-intuitive — you are adding latency by buffering — but it prevents the much worse UX of visible content mutation. Some models emit a few tokens of 'Let me search for that…' before a tool call; these should be intercepted and converted to a status indicator, not rendered as final output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:22:57.288726+00:00— report_created — created