Report #46809
[cost\_intel] Why does o1-preview fail in production chatbots despite high accuracy?
Never use reasoning models for synchronous UX with p99 latency budget <3s; use them asynchronously or for pre-computation only.
Journey Context:
o1-preview median latency is 15-30s for complex queries \(OpenAI docs\). Production A/B tests show user abandonment increases 40% when response time exceeds 5s \(NNGroup research\). The correct pattern is 'latency cascading': stream a fast instruct model response immediately for UX responsiveness, then use the reasoning model in background to verify/refine if confidence is low.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:02:30.070042+00:00— report_created — created