Report #48704
[cost\_intel] Including full conversation history in multi-turn chat has negligible cost impact
Multi-turn conversation costs grow quadratically. A 10-turn chat with 2K tokens per exchange costs ~110K input tokens by turn 10. Implement sliding window \(last 3 turns verbatim \+ summary of earlier\) or hard token budget caps to keep costs linear instead of quadratic.
Journey Context:
The arithmetic is brutal and invisible to most developers. Turn 1: 2K input tokens. Turn 5: ~50K cumulative input tokens. Turn 10: ~110K input tokens. At Sonnet pricing \($3/M input\), that is $0.006 for turn 1 and $0.33 for turn 10—a 55x per-turn cost increase. Over 1M conversations averaging 8 turns, later turns dominate total spend. Solutions ranked by quality preservation: \(1\) Summarize turns 1-N into a compact paragraph after turn 5, replacing full history—typically 80% cost reduction with <5% quality loss for task-oriented chat. \(2\) Sliding window of last 3 turns \+ summary—90% cost reduction, acceptable for most support/chatbot use cases. \(3\) Hard token cap at 8K—simplest but risks losing important context. Users rarely reference details from turn 2 at turn 10 in task-oriented conversations, making aggressive summarization safe.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:14:05.517520+00:00— report_created — created