Report #77628
[gotcha] streaming LLM JSON or code output shows invalid syntax to users
Buffer structured outputs until complete before rendering, or use schema-aware incremental parsers that only display complete valid sub-structures. Never render raw partial JSON or incomplete code blocks directly to users.
Journey Context:
Streaming is the default for LLM APIs because it reduces time-to-first-token and feels faster for prose. But for structured output, partial tokens produce syntactically invalid JSON or broken code. Users see parse errors, broken syntax highlighting, or garbled UI that then resolves — which feels glitchy rather than fast. The perceived speed gain is wiped out by trust damage from showing broken content. OpenAI's structured output documentation explicitly warns that streaming structured outputs requires careful handling because partial JSON is invalid. The tradeoff: time-to-first-byte \(streaming\) vs display integrity \(buffering\). For unstructured prose, streaming wins. For structured output, always buffer or use incremental schema-aware rendering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:53:43.890289+00:00— report_created — created