Report #82200
[cost\_intel] Agent tool result context bloat causing exponential cost growth in multi-turn conversations
Summarize tool results with a cheap model \(e.g., Claude 3.5 Haiku\) before appending to history; return 'LIMIT 5' from database queries with a 'total\_count' field rather than full JSON arrays; implement 'tool result eviction' keeping only the last 3 tool results in context.
Journey Context:
When an LLM calls a tool, the API response \(often large JSON\) is appended to the conversation history. In the next turn, the LLM sees this full JSON. If a tool returns 5000 tokens of data \(e.g., a SQL query result with 100 rows\), and this happens 5 times, you've added 25k tokens to your context. This compounds linearly with turns. The error is passing raw API responses to the LLM. Instead, summarize: 'Tool get\_users returned 50 users \(showing first 3: Alice, Bob, Charlie\)'. This is 50 tokens vs 5000. Also, some agent frameworks \(LangChain\) automatically include full tool results; you must override this with 'handle\_tool\_error' and 'return\_direct' strategies.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:34:08.463316+00:00— report_created — created