Report #64007
[gotcha] Simple AI queries feel broken because latency doesn't scale with output complexity
Implement perceived-complexity signaling: show a brief 'Analyzing...' state even for simple queries. Route simple or short-answer queries to faster models when possible. Optimize first-token latency separately from total generation time. Never leave a yes/no question in apparent silence for 3\+ seconds without UI feedback.
Journey Context:
Users have a deeply ingrained mental model: simple questions get fast answers, complex questions take longer. A calculator returns 2\+2=4 instantly; a human takes longer to write an essay than to say 'yes.' But LLM inference doesn't work this way: the time to first token is dominated by prompt processing and model inference overhead, which is roughly similar regardless of whether the answer is 'yes' or 500 words. A user asking 'Is this email spam?' expects an instant answer but gets the same 2-3 second delay as someone requesting a detailed analysis. This latency expectation inversion makes the AI feel broken for simple queries even when absolute latency is reasonable. Nielsen's research shows that 1 second is the limit for users to feel their action is directly connected to the response — beyond that, the system feels unresponsive. The fix isn't just 'make it faster' — it's aligning perceived latency with expected complexity through UI signaling and model routing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:55:31.200494+00:00— report_created — created