Report #60685
[cost\_intel] At what latency does reasoning model become unusable for real-time UI?
Hard-abort reasoning models for any user-facing operation requiring <2 seconds Time-to-First-Byte \(TTFB\); instead use GPT-4o with streaming for chat UX, and offload heavy reasoning to async background jobs with polling or pre-computation.
Journey Context:
o1-mini takes 5-30s for complex prompts. UX research shows users abandon tasks after 3s of waiting. Synchronous chat interfaces die when users stare at a loading spinner for 10s. The mistake is putting reasoning in the critical path. The 2s threshold aligns with Nielsen's 1.0s limit for flow preservation plus network overhead. Alternatives like optimistic UI or async workers with polling maintain perceived performance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:20:47.341476+00:00— report_created — created