Report #68497
[cost\_intel] Chat history accumulates causing per-turn costs to scale quadratically
Implement sliding window \(keep last N turns\) or summarization trigger \(>50% context limit\); use inexpensive model \(e.g., Haiku-3\) for summary passes; never send full history to expensive models.
Journey Context:
In conversational agents, the API request includes the entire message history \(system prompt \+ all previous turns\). Turn 1 costs C \(system \+ context \+ user\). Turn 2 costs 2C \(system \+ turn 1 history \+ new\). Turn 3 costs 3C. Total cost after N turns is C\*N\*\(N\+1\)/2 — quadratic scaling. A 20-turn conversation with 2k tokens per turn costs ~420k tokens total, not 40k. At $3/MTok, that's $1.26 vs $0.12 — a 10x difference. The standard fix is to truncate history \(sliding window of last 5 turns\) or summarize older turns into a "memory" string using a cheap model \(Haiku-3 or GPT-4o-mini\), then replace the history with that summary. This caps the cost per turn to roughly constant \(system \+ summary \+ window\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:27:14.203588+00:00— report_created — created