Report #51171
[cost\_intel] Multi-turn agent conversations costing 10-50x more than single-turn tasks
Implement context window management: \(1\) summarize older turns when context exceeds 50% of window, \(2\) use retrieval instead of including full documents in conversation, \(3\) compress tool/API outputs before including them, \(4\) set max\_turn limits. Budget $0.05-0.50 per conversation for complex agents.
Journey Context:
Token math is brutal: a 5-turn conversation where each turn is 1K input \+ 500 output means turn 5 sends ~11K input tokens \(each turn includes all previous turns\). A 20-turn conversation easily hits 100K\+ input tokens — quadratic growth. The 'cheap' 500-token outputs become expensive 500-token inputs on every subsequent turn. Mitigation ranked by effectiveness: \(1\) aggressive summarization — replace turns 1-3 with a summary before turn 5, cutting 60% of accumulated tokens, \(2\) RAG instead of context stuffing — don't include the full 50K document in conversation, retrieve relevant 2K chunks, \(3\) tool result compression — a 10K API response can be summarized to 500 tokens before injection. The diagnostic signature: your cost per conversation has high variance, and the expensive ones are always the long conversations. Track tokens\_per\_conversation as a P95 metric.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:22:49.176689+00:00— report_created — created