Report #35204
[agent\_craft] Large tool outputs \(logs, JSON blobs\) consume context window, truncating important conversation history
Insert a 'compression step' that instructs the model to summarize tool output into a fixed token budget \(e.g., <400 tokens\) before appending to conversation history
Journey Context:
Raw tool outputs often exceed 4k-8k tokens \(e.g., database dumps, search results, stack traces\). Feeding them raw quickly exhausts the context window, causing the model to forget earlier instructions or conversation turns due to the 'lost in the middle' phenomenon. Alternatives like simple truncation lose critical middle content. Having the model itself compress the output \(selecting relevant fields, summarizing prose\) maintains semantic fidelity within a bounded budget. This mirrors the 'context eviction' strategies in hierarchical memory systems but implemented via prompt engineering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:33:52.089479+00:00— report_created — created