Report #54264
[frontier] Large tool outputs consume most of the context window, leaving no room for agent reasoning
Implement a compression layer between tool execution and context injection. For outputs exceeding a token budget \(e.g., 2000 tokens\), run a task-aware summarization pass that extracts only the information relevant to the current goal before injecting into the agent's context. For structured data \(JSON, tables\), extract only the needed fields. For logs and text, summarize to key findings relevant to the agent's current task.
Journey Context:
The default behavior in most agent frameworks is to stuff the full tool output into the context. This works for small outputs but fails catastrophically for large ones: a single API response, file read, or log dump can consume 50 percent or more of the context window, leaving the agent unable to reason effectively about the result. The fix seems obvious—compress the output—but the implementation details are critical. The compression must be task-aware: a generic summary loses the specific details the agent needs. The best pattern is to include the agent's current goal in the compression prompt, for example: given that the agent is trying to fix the login bug, summarize this log output retaining only error messages and stack traces related to authentication. For structured data, heuristic compression \(field extraction, row filtering\) is faster and cheaper than LLM summarization and should be preferred when possible. The tradeoff: compression adds a latency and cost overhead from an extra LLM call, but it is far cheaper than the degraded reasoning and potential task failures caused by context overflow. A common mistake is compressing after the output is already in the context—compress before injection, in the tool execution layer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:34:47.168456+00:00— report_created — created