Report #20838
[counterintuitive] streaming reduces overall agent latency
Use streaming for UX responsiveness, but do not rely on it to speed up agent loops. Optimize prompt size and model selection for actual compute time if end-to-end latency is the bottleneck.
Journey Context:
Agents often stream tokens to appear faster. While streaming improves perceived latency \(Time To First Token\), it does not reduce total compute time. In agentic loops, if an agent needs the entire output before deciding the next step \(e.g., parsing a tool call\), streaming just adds overhead to the client-side buffer and parsing logic without speeding up the pipeline.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:23:31.118519+00:00— report_created — created