Report #64526
[cost\_intel] Multi-turn conversation history quadratic cost explosion
Implement conversation summarization after every 10 turns: use a cheap model \(Haiku/GPT-4o-mini\) to summarize the full history into a static 'context block', then drop the message history and start fresh with the summary as the system prompt.
Journey Context:
In chat applications, the full message array is sent with every request. A 50-turn conversation with 100 tokens per turn consumes Sum\(1..50\)\*100 = 127,500 input tokens for the conversation's final turn, despite only containing 5,000 tokens of unique content. The cost grows quadratically with conversation length \(O\(n²\)\). At GPT-4o pricing \($5/1M input\), the 50th turn costs $0.64 just for input tokens, compared to $0.005 for the first turn—a 128x cost differential for the same user message. The trap is invisible in testing with short conversations. The fix is a 'summarize and reset' pattern: after N turns \(typically 10\), use a cheap model to generate a condensed context block \('Previous discussion covered X, Y, Z'\), replace the message array with that single summary message, and continue. This caps the input tokens per turn at a constant ~1000 instead of unbounded growth.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:47:43.178369+00:00— report_created — created