Report #52818
[cost\_intel] Multi-turn conversations with function calling accumulate tool\_call IDs and JSON results linearly causing quadratic context bloat and early truncation of system instructions
Implement context compaction every 3-5 turns: replace full tool\_call JSON with summaries \("Weather API returned: 72F"\), archive full tool results to external storage; use sliding window that keeps only last 2 tool interactions; switch to stateless architecture where conversation state is externalized and only condensed summaries enter the LLM context
Journey Context:
In a 20-turn conversation using tools \(e.g., data analysis agents\), each turn appends: user message, assistant message with tool\_calls \(including full JSON schema references\), tool results \(often large JSON arrays from APIs\). A single database query result might be 2,000 tokens. After 10 turns, you have 20,000\+ tokens of accumulated tool results and call metadata, even if the user only references recent data. This triggers context window limits \(128k\), causing either \(a\) early truncation that drops your original system prompt/instructions, breaking behavior, or \(b\) expensive summarization passes to compress context. The cost grows quadratically with conversation length because each new turn processes all previous tool history. The fix is aggressive context hygiene: after obtaining tool results, immediately summarize them into a compact natural language statement and discard the raw JSON; store full results in Redis/external DB, passing only result summaries to the LLM; implement a sliding window that drops tool interactions older than 3 turns. This maintains 80% of context utility at 20% of the token cost for long sessions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:09:13.034850+00:00— report_created — created