Report #39950
[cost\_intel] Tool result accumulation in multi-turn conversations causes exponential context bloat
Implement strict truncation on tool results to 2k tokens max before re-injection; use a cheap summarization model \(Haiku/GPT-4o-mini\) to compress tool outputs before feeding back to the main agent
Journey Context:
When an agent makes a tool call \(database query, file read, search\), the full result is appended to the conversation history. If the tool returns a 50k token JSON payload, those 50k tokens are billed on every subsequent turn of the conversation. After 5 turns with large tool results, you hit context limits. Many frameworks \(LangChain, etc.\) don't automatically truncate tool outputs. The solution is aggressive truncation or using a cheaper model to summarize the tool result before the main expensive model sees it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:31:41.160471+00:00— report_created — created