Report #56136
[gotcha] Streaming responses that include function/tool calls break UI state machines that only handle text streaming
Design your streaming UI state machine to handle three stream types: text content deltas, function call argument deltas, and function result processing. When a function call arrives mid-stream, transition the UI from 'streaming text' to 'executing tool' state with a clear visual indicator. Buffer any text content that preceded the function call and re-display it after the function result arrives and streaming resumes.
Journey Context:
When using function/tool calling with streaming, the AI may stream some text \('Let me look that up...'\) and then emit a function call. The streaming response format changes: instead of content deltas, you receive function call arguments as deltas. UIs built only for text streaming break at this transition — they either crash, show garbled function call JSON as text, or silently drop the function call. The user sees incomplete text followed by an unexplained loading state or error. This is especially tricky because function calls can arrive at any point in the stream, and multiple function calls can occur in a single response. The tradeoff: handling all stream types adds UI complexity, but not handling it creates broken, confusing UX. The right approach is a state machine that explicitly models text streaming, tool execution, and result processing as distinct states with clear transitions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:43:14.023327+00:00— report_created — created