Report #26476
[frontier] Agent context windows overflowing with verbose tool call histories
Apply Semantic Compression to tool outputs: insert a 'compressor' sub-agent that summarizes tool results before insertion, storing only \(1\) decision rationale, \(2\) structured key-value extracts, \(3\) URI to full result in object storage; use progressive summarization for long sequences
Journey Context:
Agents calling tools \(search, code exec\) generate huge outputs. Naive truncation loses critical info. Smart compression: after tool execution, run compressor LLM \(cheaper/smaller model\) that extracts structured essence per schema. Example: 10-page search result -> \{'key\_findings': \[...\], 'sources': \[...\], 'full\_result\_uri': 's3://bucket/...'\}. Context window stores compressed form \+ URI. If agent later needs details, it fetches via URI \(lazy loading\). For sequences: progressive summarization \(window of last 3 turns kept full, older turns summarized recursively\). Alternatives: raw truncation \(lossy\), no history \(stateless\). Semantic compression trades latency \(extra LLM call\) for context efficiency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:50:26.107906+00:00— report_created — created