Report #49827
[cost\_intel] Multi-turn agent loops burn 30-50% of tokens on repeated tool definitions and growing history
Implement dynamic tool selection \(select 1-2 tools per turn via cheap classifier\) and conversation compression \(summarize history to state after N turns\) to cut per-turn overhead by 70%.
Journey Context:
In ReAct-style agents, every turn sends the full conversation history plus the complete JSON schema of all available tools. A typical agent with 10 tools \(500 tokens each\) resends 5,000 tokens of tool definitions every turn. Over a 10-turn loop, this is 50,000 tokens of static overhead unrelated to the actual task. As history grows, the quadratic attention cost and linear token pricing cause super-linear cost growth \(turn 1: 2k tokens, turn 10: 20k tokens\). The trap is visible in per-turn logs but invisible in aggregate dashboards that don't attribute cost to 'tool overhead' vs 'task tokens.' The fix has two parts: \(1\) Dynamic Tool Selection: Use a cheap model \(GPT-3.5-Turbo or a fine-tuned classifier\) to select only the 1-2 relevant tools for the next step based on the current state, reducing the 5k token overhead to 1k. \(2\) Conversation Compression: After every 5 turns, use a cheap model to summarize the conversation history into a static 'state' object \(e.g., 'User wants X, has provided Y, next step is Z'\) and truncate the middle turns, resetting the context window growth. Together, these reduce the 'tool tax' from 30-50% of total tokens to <10%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:07:17.334127+00:00— report_created — created