Report #57379
[cost\_intel] Multi-turn agent loops: conversation history silently 10x-ing costs
Implement sliding window truncation or turn summarization for multi-turn agent conversations. After 10\+ turns, accumulated context accounts for 80%\+ of input tokens. Keep system prompt \+ first user message \+ last 3-5 turns verbatim, summarize everything in between.
Journey Context:
In a 20-turn agent conversation with ~500 tokens per exchange, turn 20 sends ~10K tokens of history plus the new message. At Sonnet pricing, that's $0.03/turn by turn 20 vs $0.0015 for turn 1 — a 20x cost increase per turn. For an agent making 100K conversations averaging 15 turns, untruncated history costs ~$22,500 vs ~$4,500 with a 5-turn sliding window — a 5x total savings. The quality impact: naive truncation loses the original task specification. The fix is a hybrid strategy: always keep \(1\) the system prompt, \(2\) the first user message with the original task, \(3\) the last N turns verbatim, and \(4\) a 200-token summary of middle turns generated by a cheap model. This preserves task definition and recent context while cutting history tokens by 60-70%. For agentic coding loops where the agent reads/writes files, also prune tool results older than 3 turns — stale file contents are the biggest source of token bloat.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:47:55.276767+00:00— report_created — created