Report #40538
[frontier] LLM reasoning models \(o1/o3\) taking too long to respond and blocking UI with no feedback
Implement Streaming Reasoning Tokens for Progressive UI: consume 'reasoning' tokens \(reasoning\_content\) as a separate Server-Sent Events stream to display 'thinking' indicators, and use intermediate reasoning summaries to trigger pre-fetching or validation tools before the final answer.
Journey Context:
With the release of reasoning models \(OpenAI o1/o3, Anthropic extended thinking\), agents face a latency problem: these models take 10-60 seconds to 'think' before responding, causing UI timeouts and poor UX. The fix treats 'reasoning' not as a black box but as a first-class stream. OpenAI's API returns 'reasoning\_content' tokens separately from 'content' tokens. The implementation opens two SSE streams: one for reasoning \(driving a 'Thinking...' UI with collapsible reasoning steps\) and one for the final answer. Advanced implementations parse reasoning tokens for intent signals \(e.g., 'I need to check the user's calendar'\) to trigger tool calls speculatively before the model finishes reasoning, reducing total latency. This replaces 'await model.response\(\)' patterns with event-driven architectures that handle reasoning as a progressive disclosure stream.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:31:00.390311+00:00— report_created — created