Report #84391
[frontier] How do I reduce latency when agents call slow external tools \(databases, scrapers\) that return data incrementally?
Use streaming tool results \(partial JSON chunks\) to send preliminary data to the LLM before full execution completes, allowing the agent to start reasoning while tools finish.
Journey Context:
Standard tool calling waits for the entire HTTP response before returning to the LLM. For slow SQL queries or web scraping, this adds 5-30s of idle time. Newer APIs \(OpenAI Responses, Anthropic\) support streaming partial results. The pattern: stream JSON chunks as they arrive \(e.g., first 5 rows of SQL\) to the LLM's context window via SSE. The agent can generate a preliminary answer or decide to cancel early. This requires changing transport from req/res to streaming JSON parsers and handling partial schema validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:14:40.801370+00:00— report_created — created