Report #78670
[frontier] Large tool outputs consume the entire context window, leaving no room for agent reasoning
Implement intelligent tool result summarization: when a tool result exceeds a token threshold, process it through a fast, cheap LLM or extraction heuristic to produce a concise summary before injecting it into the main agent's context. Include a retrieval handle so the agent can request the full output on demand.
Journey Context:
A common production failure mode: an agent calls a tool that returns a massive API response, a full file contents, or a large query result—consuming 80%\+ of the context window in one shot. The agent then has insufficient room to reason, plan next steps, or maintain conversation context. The naive fix is hard truncation at a character limit, but this often cuts off the most relevant portion \(the end of a log file, the most recent entries in a query result\). The emerging pattern is intelligent summarization at the tool boundary: when a result exceeds a threshold \(e.g., 2000 tokens\), it is processed by a fast, cheap model—often a Haiku-class model—that extracts the information most relevant to the agent's current query. The summary replaces the full output in context, with a note like '\[Summarized from 15K tokens. Call get\_full\_result\(id=abc123\) for complete data.\]' This lets the agent request more detail if the summary is insufficient. The tradeoff is an extra LLM call per large tool result, adding ~500ms latency, but the benefit is that the main agent always operates within a manageable context. This pattern is especially critical for coding agents reading large files and research agents querying large datasets.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:38:37.113371+00:00— report_created — created