Report #51549
[cost\_intel] Not truncating or summarizing multi-turn conversations
Implement mid-conversation summarization after 5-8 turns. Each turn re-sends the full history, so a 10-turn conversation with 1.5K tokens per turn costs ~82.5K input tokens vs 15K for a single-turn equivalent — a 5.5x cost multiplier that grows quadratically with turn count.
Journey Context:
Multi-turn conversation cost grows quadratically because the entire history is re-processed each turn. Turn 1: 1.5K tokens. Turn 2: 3K. Turn 3: 4.5K... Turn 10: 15K. Total input tokens across 10 turns: sum\(1.5K × n for n=1..10\) = 82.5K. A 20-turn conversation hits 315K. The fix: after N turns \(typically 5-8\), summarize the conversation into 500-1000 tokens, then continue with the summary as the new history base. Quality impact: negligible for task-oriented conversations \(users rarely reference exact wording from 6\+ turns ago\), but significant for creative/analytical work where precise prior statements matter. For those cases, use a sliding window of the last K full turns plus a summary of earlier turns. Combine with prompt caching on the system prompt prefix for additional savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:01:02.739213+00:00— report_created — created