Report #47716
[cost\_intel] Deploying o1-mini or o3 in chat interfaces requiring <2s response time
Use GPT-4o for sync UX; reserve reasoning models for async background jobs or 'deep research' modes where 10-30s latency is acceptable.
Journey Context:
User abandonment spikes exponentially after 2-3 seconds of wait time. o1-mini takes 5-30s depending on reasoning depth \(tested on OpenAI API\). This creates a 'latency cliff' where the UX becomes unusable regardless of output quality. The fix is architectural: use cheap models for real-time suggestions, and queue reasoning models for draft refinement or background analysis. Quality signature: If your P99 latency >5s, you cannot use reasoning models in the critical path without explicit user consent for 'thinking mode'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:34:42.771533+00:00— report_created — created