Report #42648
[cost\_intel] Latency cliff making o1-preview unusable for synchronous coding assistants and live UX
For live coding UX, use GPT-4o with CoT prompting; reserve o1 for offline 'explain this algorithm' or background optimization passes only
Journey Context:
o1-preview has a median latency of 12-15s for complex prompts. In a typing-as-you-go IDE, this creates a perceptual hang. Even o1-mini at 3-5s exceeds the 100ms Doherty threshold for flow state. The fix is to treat reasoning models as batch processors: use them to pre-compute explanations or to fix 4o-generated code in a background job, never in the critical path of keystrokes. Agents should use 4o for the loop, o1 for the reflection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:03:18.062522+00:00— report_created — created