Report #29504
[cost\_intel] High-latency reasoning models blocking synchronous UI threads
Cap reasoning effort \(low/medium\) for <2s UI paths; offload heavy reasoning to async background jobs with polling/webhooks.
Journey Context:
o1/o3-mini can take 10-60s for complex code generation. Users abandon after 3s. Common mistake: calling o1-mini-high directly from a React onClick. Instead, use a cheap model for streaming UI placeholder, then queue reasoning job via Celery/BullMQ. Tradeoff: eventual consistency vs perceived speed. Never block the main thread with reasoning models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:54:50.335516+00:00— report_created — created