Report #86539
[cost\_intel] Reasoning models cause user abandonment in real-time chat interfaces
Never use o1/o3 in synchronous UX paths; cap response latency at 8s using GPT-4o with streaming, offload reasoning to async background jobs.
Journey Context:
Critical UX threshold: user abandonment spikes after 8 seconds of waiting. Reasoning models \(o1-preview\) take 15-45s for complex tasks, breaking conversational flow. Common anti-pattern: 'I'll just use o1 for everything to be safe.' Fix: Architect separation - fast path \(GPT-4o \+ streaming for immediate ACK\) vs slow path \(reasoning model in background with WebSocket push or polling\). Cost bonus: 4o is ~30x cheaper than o1.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:50:36.063686+00:00— report_created — created