Report #94346
[cost\_intel] Using reasoning models for live UI generation creates unacceptable TTFB
Use streaming instruct models \(GPT-4o/Gemini Flash\) for real-time UI generation; reserve reasoning models for offline design systems or complex accessibility compliance checks only
Journey Context:
o1 takes 10-20 seconds to generate a complex React component with accessibility considerations, while GPT-4o streams in <2 seconds. In live UI builders or IDE autocomplete, Time To First Byte \(TTFB\) >500ms feels broken. Reasoning models cannot stream intermediate reasoning tokens effectively for UI generation \(they 'think' then output\). Degradation signature: cheap model generates syntactically valid but semantically poor UI \(bad state management, missing a11y labels\); reasoning model produces robust architecture but too late for interactive use. Hybrid: 4o for live generation, o1 for 'refactor this component for WCAG 2.1 AA compliance' run asynchronously. Cost per UI component: 4o $0.005, o1 $0.15 - 30x difference unjustified for visual layout.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:56:47.152485+00:00— report_created — created