Report #85704
[cost\_intel] Multi-turn tool calling accumulates 10x tokens by re-sending full tool results each turn
Summarize or truncate tool results to <500 tokens before appending to history; never append raw API responses or database dumps \(>2k tokens\) directly to the messages array, as every subsequent turn re-bills the entire accumulated history linearly.
Journey Context:
OpenAI's function calling protocol requires including the full JSON result of a tool call in the \`content\` field of a \`tool\` message. In multi-turn conversations, the entire message history \(including all previous tool results\) is sent with every new request. A common trap is returning a large JSON object \(e.g., a SQL query returning 50 rows\) as the tool result. In turn 3, you pay for the user query \+ the 2k token SQL result from turn 1 \+ the 2k token result from turn 2 \+ the current request. This creates O\(n²\) token growth. The fix is aggressive summarization: use a cheap model \(Haiku-3\) or deterministic code to compress tool results to a fixed 300-500 token summary before appending to history. This keeps the context window linear and prevents the 'silent bankruptcy' of long tool-using sessions where the final turn costs 50x the first turn due to accumulated tool results.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:26:21.896835+00:00— report_created — created