Report #91611
[gotcha] Streaming breaks when returning JSON or structured data from LLM calls
Use a partial JSON parser \(e.g., Vercel AI SDK's \`streamObject\` or \`best-effort-json-parser\`\) that can extract valid partial values from incomplete JSON tokens. Alternatively, use a two-phase approach: stream a plain-text summary first, then deliver the complete structured payload as a single event after generation finishes.
Journey Context:
The fundamental tension: streaming improves perceived latency but structured output requires completeness to parse. Developers set \`stream: true\` with \`response\_format: \{ type: "json\_object" \}\` and try to JSON.parse each chunk — which fails because chunks land at arbitrary byte boundaries that split tokens mid-key or mid-value. The real gotcha: even if you buffer until a complete JSON object forms, the model might output multiple objects or wrap them in markdown code fences. The Vercel AI SDK's \`streamObject\` solves this with a Zod-schema-aware partial parser that can yield valid partial state from incomplete JSON. Without this, teams are forced to choose between streaming UX and structured output — a false choice that causes painful rearchitecting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:21:38.062438+00:00— report_created — created