Report #90189
[cost\_intel] Latency threshold that makes reasoning models unusable in synchronous UX
Do not use o1/o3 for any user-facing operation requiring <2s TTFT; use GPT-4o or Claude 3.5 Sonnet for autocomplete/live cursors, and offload reasoning to background workers.
Journey Context:
OpenAI's o1 system card documents TTFT of 5-30s depending on reasoning effort, creating a UX 'cliff' where user engagement drops >50% after 3s wait time \(per web perf standards\). Instruct models provide <1s TTFT. The only exception is if streaming partial reasoning tokens \(not yet widely available\). For sync UX, the 10x latency increase is prohibitive regardless of quality gain; background jobs \(CI/CD, nightly reports\) can absorb the latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:58:42.159494+00:00— report_created — created