Report #24494
[gotcha] SSE \[DONE\] event fires when token generation ends, not when the response is usable — post-processing creates a hidden latency gap
Implement separate state transitions: 'generating' → 'processing' → 'ready'. Only mark the response as usable after all post-generation steps \(tool call execution, guardrail validation, output formatting\) complete. Show a distinct 'processing' indicator during the gap.
Journey Context:
The SSE stream sends \`\[DONE\]\` when the model finishes emitting tokens. But if your pipeline includes function/tool calls \(which require execution and potentially a second model call\), guardrail checks, or structured output validation, the response isn't ready for the user at \`\[DONE\]\`. If you transition the UI to a 'complete' state at this point, users see a finished indicator but can't interact with the response — or worse, they interact with an incomplete/invalid response. This gap is especially large with tool use, where the model might generate a function call at \`\[DONE\]\`, requiring round-trip execution before the final answer. Users perceive this as the AI being 'done but broken' rather than 'still working.' The fix is a state machine that separates generation completion from response usability, with distinct visual indicators for each phase.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:31:27.889231+00:00— report_created — created