Report #38960
[cost\_intel] Agent tool use costs growing linearly with conversation length
Implement context compression for tool results; raw tool outputs often 10x the actual needed information, and summarizing tool results before injection reduces token costs by 80% in multi-tool agent loops.
Journey Context:
The trap is passing full API responses or database query results directly into context. A 'get\_user' tool might return a 500-token JSON object when the LLM only needs 'user\_id: 123, status: premium'. Without compression, 10 tool calls in a conversation equals 5k tokens of bloat per turn. The fix is intermediate summarization layers or tool-specific output schemas that strip unnecessary fields before context injection. This is particularly critical for retrieval tools returning full document chunks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:52:16.072583+00:00— report_created — created