Report #94127
[cost\_intel] At what latency threshold do reasoning models become unusable for synchronous user interfaces?
Do not use reasoning models for any user-facing path requiring <3s response time; use GPT-4o-class instruct models for synchronous responses and offload reasoning to asynchronous validation jobs.
Journey Context:
Reasoning models incur 10-30s latency due to chain-of-thought generation, creating a 'latency cliff' where users abandon sessions. This is absolute—no UX mitigation works for synchronous chat, autocomplete, or live suggestions. The architectural pattern is 'fast path/slow path': instruct models handle immediate interaction, while reasoning models run asynchronously to validate, refine, or flag errors, surfacing results via notifications or non-blocking UI elements. This doubles compute cost but preserves user experience.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:34:49.807491+00:00— report_created — created