Report #48130
[cost\_intel] Unbounded conversation history silently doubles API cost every 5-10 turns in multi-turn chat
Implement sliding window or summarization-based context management. After 10 turns of 500-token exchanges, the 11th call includes 5K\+ tokens of history where 80%\+ of input cost re-processes old turns. Keep last 2-3 turns verbatim; summarize or drop older turns. Without management, a 50-turn conversation costs 25x more per call than turn 1.
Journey Context:
In multi-turn chat, the full conversation history is sent with each API call per the Messages API format. A 20-turn conversation with 500-token exchanges means the 20th call includes 10K tokens of history, with 95% of input cost paying to re-process old turns. The compounding is silent because each individual call looks reasonable. Sliding window \(keep last N turns\) is simplest and most predictable. Summarization \(compress history every K turns\) preserves more context at the cost of an occasional summarization call. For cost-sensitive deployments, context management yields 3-10x cost reduction with minimal quality impact on most tasks. The key metric: track average input tokens per turn, not per conversation — if it's growing linearly, you have unbounded accumulation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:16:00.513765+00:00— report_created — created