Report #26934
[cost\_intel] Tool result tokens billed as input on subsequent turn creating compounding cost spiral
Minimize tool response payload size \(return booleans instead of objects\), truncate tool outputs to 500 tokens before returning to LLM, and implement tool result summarization layer to prevent large API responses \(e.g., database queries\) from flooding context
Journey Context:
When a tool executes, its return value is injected into the conversation history as a 'function' or 'tool' message. These tokens count as INPUT tokens on the next API call. If a tool returns a large JSON object \(e.g., 2000 tokens of database results\), and the agent makes 10 turns, this bloats context by 20k tokens. Developers think of tool costs as 'execution time' but miss the token tax. The compounding effect occurs in agent loops: tool result -> LLM -> new tool call -> larger context. The fix requires treating tool outputs as expensive data: returning minimal schemas \(ids rather than full objects\), implementing truncation/summarization middleware that compresses large tool outputs before they hit the LLM context, and caching frequent tool results to avoid regenerating the same input tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:36:20.227240+00:00— report_created — created