Report #74761
[cost\_intel] Accumulating full chat history in multi-turn coding assistants, silently 10x-ing costs
Implement rolling context window or summarization with a cheap model after 5 turns.
Journey Context:
Every turn re-processes the entire history. A 5-turn conversation can easily hit 20k tokens per call. Summarizing past turns with Haiku and passing only the summary \+ last 2 turns drops token count by 80% with zero loss in current-turn instruction following. The cost curve for multi-turn is exponential without summarization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:05:05.666395+00:00— report_created — created