Report #24535
[cost\_intel] Reasoning models cause user abandonment in interactive coding assistants
Route to GPT-4o for streaming responses <2s; queue o1 only for async CI checks, overnight migrations, or explicit 'deep research' buttons.
Journey Context:
o1-preview's time-to-first-token is 10-30 seconds vs <1s for GPT-4o. Nielsen Norman Group research shows user flow breaks after 10s without feedback. Agents often default to the 'smartest' model, destroying UX in chat interfaces. The correct architecture uses model routing based on user-waiting state, not task complexity alone.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:35:31.973184+00:00— report_created — created