Report #66355
[cost\_intel] Latency cliff makes o1 unusable for live coding assistants
Use GPT-4o for outputs <500 tokens with <800ms latency; reserve o1 for >1000 token complex refactors with async background processing only
Journey Context:
o1-mini has ~10s latency for 2k tokens, while GPT-4o streams in <1s. In IDE autocomplete, users abandon after 1500ms. Attempting to use o1 for 'write a React component' causes UX abandonment despite 15% better code quality. The correct pattern is: 4o for live typing, o1 for 'Refactor this entire module' buttons that show loading spinners. Cost is secondary to latency here; the real waste is user churn.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:51:25.526715+00:00— report_created — created