Report #65707
[cost\_intel] At what latency threshold do reasoning models \(o1/o3\) become unusable for interactive chat interfaces?
Do not use reasoning models for synchronous UX paths where Time-To-First-Token \(TTFT\) >800ms; use GPT-4o \(TTFT ~300ms\) and offload reasoning to background async jobs or use 'fast edit' mode.
Journey Context:
Human perception studies show that delays >1 second break flow state in typing contexts. o1-preview has TTFT of 3-10 seconds depending on reasoning effort, making it unusable for real-time pair programming or chat. The cost isn't just money \($60/1M tokens vs $5\) but user abandonment. The fix is a 'fast path/slow path' architecture: GPT-4o for immediate response, with a background o1 call for 'deep analysis' that streams in later, or using o1-mini which hits 800ms TTFT at acceptable quality for code review.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:46:17.971002+00:00— report_created — created