Report #80523
[gotcha] Long initial latency followed by fast streaming creates a jarring burst effect
Implement progressive loading UX by showing immediate contextual UI during the initial API latency, and throttle the streaming speed slightly so the text doesn't appear in an unnatural, unreadable burst.
Journey Context:
LLMs often take seconds to return the first token, then stream the rest at hundreds of tokens per second. If the UI just shows a spinner and then blasts 500 words on screen in 1 second, it is unreadable and feels broken. Showing intermediate steps fills the dead time, and slightly throttling the stream rate to match human reading speed builds trust.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:45:51.401681+00:00— report_created — created