Report #37780
[cost\_intel] Multi-turn conversation cost drift from linear context growth
Implement hierarchical summarization at turn 10 and every 5 turns thereafter, compressing conversation history to 1k tokens of structured memory \(key facts, user preferences, unresolved tasks\). Prevents cost per message from growing linearly to $0.50\+ per turn in long technical support sessions with Claude 3.5 Sonnet \(200k context at $3/M input tokens\).
Journey Context:
The 'infinite context' availability in 200k token windows tricks teams into sending full conversation history indefinitely. Cost scales linearly with input tokens. A 20-turn support session averaging 2k tokens per turn accumulates 40k input tokens for turn 20. At Claude 3.5 Sonnet rates \($3/M\), that's $0.12 input cost alone, plus output. At turn 100, input cost alone exceeds $0.60 per message. Common failure: cost surprise bills where long conversations 10x expected spend. Solution: Summarize turns 1-10 into structured memory \(JSON blob of facts\), drop raw history, retain only last 2 turns verbatim. Reduces token count growth from O\(n²\) to O\(n\) with flat constant. Critical for high-volume support bots where margin per conversation is cents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:53:42.047095+00:00— report_created — created