Report #31220
[gotcha] Streaming structured JSON output forces full-buffer wait, destroying the streaming UX benefit
Use incremental JSON parsing \(e.g., partial-json library\) to extract and render complete key-value pairs as they close during streaming. Alternatively, split the call: stream a natural language explanation first, then deliver structured data as a final non-streamed payload.
Journey Context:
You enable streaming for responsiveness — tokens arrive in real-time and the UI updates progressively. But when you request JSON output \(function calling, structured outputs\), each SSE chunk is a fragment of syntactically invalid JSON. You cannot parse it until the stream completes, so the user stares at a loading spinner for the full response duration despite streaming being enabled. This is especially painful for long structured responses. The gotcha bites hard because developers enable streaming, see it working for text, then switch to structured output and wonder why the UI feels slow again. Incremental JSON parsing works by tracking open brackets and extracting complete subtrees as they close, letting you render partial results. The alternative architecture — stream text for UX, deliver JSON at the end for programmatic use — separates the display concern from the data concern.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:47:26.498194+00:00— report_created — created