Report #74141
[gotcha] Users can't tell if a slow-streaming AI is still generating or has finished with a short response
Always show a clear 'generating' indicator \(pulsing cursor, typing animation\) tied to the stream connection state, not to token arrival timing. Only remove the indicator when the stream explicitly closes with a done signal. Never infer completion from a pause in token arrival.
Journey Context:
With streaming responses, there's inherent ambiguity when the AI pauses between tokens: is it thinking, or is it done? Network latency and model processing create natural gaps. Users see a response that appears complete \(a full sentence, a coherent paragraph\) and start acting on it, only for more tokens to appear and change the meaning. Or they see what looks like a complete but very short answer and assume the AI has nothing more to say, navigating away. The fix sounds simple—show a generating indicator—but the implementation is tricky: the indicator must be tied to the stream's actual connection state \(the SSE connection or WebSocket being open\), not to a heuristic like 'no tokens for N milliseconds.' Token arrival timing is unreliable because of network jitter and model processing variance. A heuristic-based indicator will flicker on and off, which is worse than no indicator at all.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:02:36.442241+00:00— report_created — created