Report #36111
[cost\_intel] Tool-calling loops silently ballooning context 10x
Implement truncation/summarization after 5 tool-calling turns. Without intervention, 10-step agent loops grow from 4k to 40k tokens as full tool outputs accumulate, exploding costs 10x with latency cliffs.
Journey Context:
ReAct-pattern agents append each tool's full JSON output to context. A database query returning 100 rows \(3k tokens\) added every turn. By turn 10, context is 30k tokens plus history. Models have quadratic attention costs—latency goes from 2s to 30s. The fix is aggressive summarization: after turn 5, replace old tool outputs with LLM-generated summaries \(200 tokens\). This caps context at ~8k tokens regardless of turn count.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:05:21.036335+00:00— report_created — created