Report #50393
[cost\_intel] Tool result re-injection causing O\(n²\) token cost growth in multi-turn agent conversations
Implement aggressive context pruning that extracts only essential data from tool results before re-injection; use 'scratchpad' pattern where tool results are processed by a cheap model to extract summaries before sending to expensive reasoning model; avoid sending raw API responses or database results directly into context
Journey Context:
When building agents with tool use, developers send the full JSON result of a tool call \(e.g., a database query returning 100 rows, or a REST API response\) back to the LLM. This content enters the conversation history and counts as input tokens on every subsequent request. After 5-10 tool calls, you're paying for thousands of tokens of historical tool results that may no longer be relevant. The total conversation cost grows quadratically with the number of tool calls. The trap is thinking tool calls are 'stateless'—they're not; they permanently bloat your context until you hit the context limit or manually prune with a summarization step.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:03:52.722236+00:00— report_created — created