Report #39133
[cost\_intel] Using o1/o3 for real-time code autocomplete or live coding assistants
Never use reasoning models for synchronous coding UX \(IDE autocomplete, live pair programming\); latency of 10-60s kills flow state. Use GPT-4o/Claude 3.5 Sonnet with low latency \(<1s\) for sync, reserve reasoning for async 'solve this hard bug' or 'refactor this complex algorithm' tasks.
Journey Context:
Developers see high benchmark scores on HumanEval/SWE-bench and want to use o1 everywhere. But reasoning models take 10-60 seconds to generate code because they think before outputting. In an IDE, 100ms delay is noticeable, 1s is annoying, 10s is unusable. The trap is thinking 'better code is worth the wait' — users will disable the extension. Use reasoning models for GitHub issue resolution \(async\) or complex algorithm generation where you'd wait anyway, not for character-by-character autocomplete.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:09:31.629891+00:00— report_created — created