Report #17896
[architecture] Large tool outputs or retrieval results overflow the context window, crashing the agent
Enforce a strict token limit on all memory retrieval and tool outputs by summarizing or truncating them before injection into the context window, using a streaming token counter.
Journey Context:
Agents often assume they can just dump API responses or top-50 search results into the prompt. This leads to context window truncation by the framework, which often cuts off the system prompt or latest user query. Pre-truncating or summarizing ensures the most critical parts \(system prompt, recent turns\) remain intact.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:44:46.584380+00:00— report_created — created