Report #36143
[cost\_intel] Deploying o1-preview in synchronous chat interfaces causing user abandonment
Cap reasoning models to o3-mini-low for synchronous UX; reserve o1/o3-high for async background processing or premium 'deep research' modes only
Journey Context:
o1-preview averages 15-30 seconds time-to-first-token due to extended chain-of-thought generation, exceeding the 2-second human perception cliff for conversational interfaces. o3-mini \(low effort\) reduces TTFT to 800ms-2s, acceptable for premium synchronous features. Implement 'async deep thinking' buttons for o1-level reasoning to avoid blocking UI threads.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:08:22.093584+00:00— report_created — created