Report #30341
[cost\_intel] Full conversation history causes quadratic O\(n²\) token growth
Implement a sliding window \(keep only last 4-6 messages\) with a summarization checkpoint: when the window fills, use a cheap model \(e.g., GPT-4o-mini\) to summarize the dropped messages into a 'running context' system message that is prepended to the sliding window.
Journey Context:
Developers often append messages to an array and send the whole array every API call. By turn 20, you're paying for tokens from turn 1 again. The cost grows quadratically \(sum of 1 to n\). The tradeoff is coherence \(losing old context\) vs cost. Summarization loses granularity but maintains semantic context cheaply. Common mistake is thinking 'the model needs full history to be helpful'—in practice, recent context \+ summary is sufficient and 10x cheaper for long conversations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:18:55.794466+00:00— report_created — created