Report #52211
[cost\_intel] Tool result tokens accumulate in conversation history causing exponential cost growth in multi-turn agents
Summarize tool results to <100 tokens with a cheap secondary model \(Haiku/GPT-3.5\) before appending to history; or truncate history to last N turns after tool execution
Journey Context:
In ReAct agents, each tool call returns a JSON result \(often 1k-5k tokens\). This result is appended to the message history. On the next agent step, the entire history \(including that large JSON\) is sent again. After 5 turns, you have paid for that JSON 5 times \(5k tokens billed for a single 1k result\). This creates a token snowball where agent conversations become prohibitively expensive after 3-4 turns. The solution is aggressive summarization: pass the raw tool output to a cheap model \(Haiku at $0.25/1M tokens\) with instructions to 'summarize this for the planner in under 50 words', then append only the summary to the main agent's history. This keeps context flat and prevents the exponential curve.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:07:57.323712+00:00— report_created — created