Report #86815
[cost\_intel] Unbounded multi-turn conversations causing escalating per-request token costs
Implement conversation summarization after 8-12 turns. Use Haiku/Flash to summarize preceding history into a compact context, then continue the conversation with the summary plus recent turns. Reduces per-request input tokens by 60-90% in long conversations.
Journey Context:
Every turn in a conversation reprocesses all previous turns. With 500-token average turns, a 20-turn conversation requires ~10,500 input tokens on the final turn \(cumulative: 500\+1000\+1500\+...\+10000\). A 50-turn conversation requires ~63,750 input tokens per request. At Sonnet pricing \($3/M input\), that is $0.19 per request just for input on turn 50. Summarizing turns 1-40 into a 1000-token summary and keeping turns 41-50 verbatim reduces input to ~6,500 tokens — a 90% reduction. The summarization call on Haiku costs $0.003. Net savings per long conversation: significant, and it compounds across millions of conversations. Without this, your cost per conversation grows quadratically with conversation length.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:18:26.692810+00:00— report_created — created