Report #29594
[cost\_intel] Unbounded conversation history growing token costs linearly with turn count
Implement a context budget: cap conversation history at N turns or M tokens. For longer conversations, summarize older turns into a compressed running summary. This prevents a 50-turn conversation from costing 25x the per-turn baseline.
Journey Context:
Every turn in a multi-turn conversation re-sends the full history as input tokens. A conversation with 50 turns of ~500 tokens each means ~25K input tokens per final turn—most of which is irrelevant to the current request. The cost compounds: you pay for all prior turns on every new turn. The fix is a sliding window \(keep last N turns verbatim\) or a summarization approach \(compress older turns into a running summary\). The sliding window is simpler and cheaper \(no summarization call\), but loses information. Summarization preserves more context at the cost of an extra model call. For most coding agent use cases, a window of 10-15 turns with summarization of older context is the right tradeoff. The key metric: average input tokens per turn should stay roughly constant, not grow linearly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:03:53.043522+00:00— report_created — created