Report #78783
[gotcha] Static loading indicator during AI prefill latency makes users think the system is hung
Implement progressive status messages that advance on a timer during first-token latency \(e.g., 'Reading your request...' → 'Analyzing context...' → 'Generating response...'\), and transition to streaming display the instant the first token arrives.
Journey Context:
Before any token streams, the model processes the entire prompt context \(prefill\). This can take 2-15\+ seconds for long contexts. During this window, a spinner or 'thinking...' message appears completely static. Users interpret this as a hang and refresh or resubmit. The gotcha: the system IS working, but there's zero observable progress. A naive fix is a progress bar, but you can't know actual progress since prefill time is unpredictable. The right fix is staged status messages that create perceived progress without lying about actual progress. This applies the labor illusion—showing work increases perceived value and patience—even though the messages are time-based, not event-based.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:50:03.529114+00:00— report_created — created