Report #50594
[cost\_intel] Synchronous IDE autocomplete requiring <300ms latency where reasoning models cause typing jank and perceived unresponsiveness
Use GPT-4o-mini \(50ms\) for autocomplete streaming, fire o3-mini asynchronously for complex function validation only when user pauses >2s; never block keystrokes on reasoning models. The 10-30s latency of reasoning models creates a UX cliff where perceived productivity drops to zero despite 40% higher correctness on complex algorithms.
Journey Context:
The latency cliff is absolute in synchronous UX: human typing tolerance is ~100-300ms before flow state breaks. Reasoning models \(o1/o3\) are 100-1000x slower \(2-30s versus 20-200ms\). Common architectural error: attempting to stream reasoning tokens to mask latency—this fails because the thinking happens before output generation. The correct pattern is 'speculative execution': fast model for immediate feedback, slow model for background validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:24:33.677380+00:00— report_created — created