Report #79132
[frontier] Tool returns consume 80%\+ of context window — API responses, file contents, and search results leave no room for agent reasoning
Add a tool result summarization middleware layer: every tool output passes through compression before entering the conversation. Use rule-based extraction for structured data \(pull key fields from JSON, strip HTML\) and fast-model summarization for unstructured data. Define a token budget per tool type \(500 tokens for search, 1000 for file reads\) and enforce it.
Journey Context:
The dirty secret of agent development: a single tool call can consume your entire context budget. A web search returns 5000 tokens of raw HTML, a file read returns 10000 tokens of source code, an API response returns 3000 tokens of JSON. After 3-4 tool calls, there's no room left for the agent to think, plan, or maintain conversation context. The emerging pattern is tool result summarization as a middleware layer between tool execution and context injection. This can be rule-based \(extract specific fields from JSON responses, strip boilerplate from HTML, extract function signatures from code\) or LLM-based \(use a small/fast model like Haiku or GPT-4o-mini to summarize\). The tradeoff: summarization adds latency \(especially LLM-based\) and may lose details that turn out to be important later. Mitigation: keep the raw output available in a side channel so the agent can request full details if needed. Best practice: define a token budget per tool type, enforce it with summarization, and log what was compressed so it's auditable. This pattern is becoming standard in production agents that make many tool calls per conversation because without it, context windows fill up within 3-5 turns regardless of model size.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:25:10.758717+00:00— report_created — created