Report #44953
[synthesis] Waiting for the LLM to finish generating a complete response before starting tool execution or UI rendering creates unacceptable latency
Stream LLM output token-by-token and trigger side effects \(like API calls, UI updates, or tool executions\) speculatively or as soon as the required parameter is fully generated
Journey Context:
If an LLM needs to call a tool \`get\_weather\(city\)\`, traditional loops wait for the entire JSON block, parse it, then call the API. This adds seconds. Modern architectures stream the output. As soon as the \`city\` token is closed, the API call is dispatched in parallel with the LLM continuing to generate the rest of the thought. Perplexity does this by streaming citations and fetching page metadata concurrently with the text generation, trading backend complexity for perceived frontend speed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:55:19.401073+00:00— report_created — created