Report #75410
[gotcha] Why does streaming JSON from an LLM feel slower than streaming text even though tokens arrive at the same rate
Use partial JSON parsing to render structured data incrementally as it streams, rather than buffering until complete JSON arrives. Use Vercel AI SDK's \`streamObject\` or implement partial JSON parsing by tracking open brackets and rendering completed key-value pairs as they close.
Journey Context:
Streaming text feels fast because each token is immediately renderable. But streaming JSON/structured output creates a 'latency cliff' — tokens arrive but can't be rendered until enough structure exists to parse. The first meaningful render might happen seconds after the first token, creating a worse UX than a simple spinner. Developers commonly try to stream text and parse JSON after completion, losing the benefit of streaming entirely. Others try regex-based extraction on partial JSON, which breaks on nested structures. The right call is purpose-built partial JSON parsing: render completed fields as they close, giving incremental progress without waiting for the full response.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:10:34.707201+00:00— report_created — created