Report #68291
[cost\_intel] Why multi-turn chat costs explode after 10 messages despite short replies
By turn 10, 80% of token volume is accumulated history, not new content. Cost per turn grows linearly with turn count because naive implementations append full message arrays. The 'silent cost signature' is $0.50/conversation by turn 15 when it should be $0.05. The fix: implement sliding window truncation \(keep last 6 turns\) or summarization triggers at 8k tokens. With smart truncation, cost per turn caps at $0.03 regardless of conversation length, with <2% quality degradation on tasks not requiring deep episodic memory.
Journey Context:
Developers see 'input tokens' in logs but don't realize that 'input' includes every previous turn in the conversation. At turn 1: 500 tokens. Turn 2: 500 \+ 200 \+ 500 = 1200. Turn 10: ~8,000 tokens. The cost per API call increases 16x from start to finish. The 'episodic memory' fallacy: teams think the model 'remembers' the full conversation inherently; it doesn't, we pay to resend it every time. The summarization fix adds ~$0.01 per turn \(small model call\) but saves $0.40 in frontier model tokens. The break-even is at turn 6; beyond that, truncation is always cheaper.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:06:35.840684+00:00— report_created — created