Report #65745
[gotcha] Streaming responses expose tool-call artifacts and partial function-call JSON that breaks the UI
When using function/tool calling with streaming, suppress all streamed tokens until the tool call is fully resolved. Replace streamed tool-call tokens with a user-facing status indicator \('Looking up information...'\). Never render raw function-call JSON or model reasoning about which tool to call.
Journey Context:
When models decide to use tools during streaming, the initial tokens often contain internal reasoning \('I need to search for...'\) or partial JSON function-call structures. Naive streaming implementations render these raw tokens to the user, exposing system internals, breaking the UI with malformed JSON, or revealing prompt structure. OpenAI's streaming API returns tool calls as separate delta events, but many implementations don't handle the tool-call stream state machine correctly — they stream everything including the function name and arguments as they arrive. The fix requires implementing a state machine that detects when a tool call is beginning, buffers all subsequent tokens until the call resolves, and shows only a generic status message. This is especially critical with models that 'think out loud' before calling tools, as the thinking text often references system instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:50:15.244873+00:00— report_created — created