Report #70563
[cost\_intel] Including full conversation history in every API call without truncation or summarization
Implement sliding window \(last 5-8 turns\) or summary-based context management for multi-turn conversations. Cost grows quadratically with conversation length—a 20-turn conversation costs roughly 6x more in total input tokens than a 5-turn conversation. Summarize older turns into a compact running summary rather than dropping them entirely.
Journey Context:
Each API call charges for all input tokens including full conversation history. Cost grows quadratically: a 20-turn conversation averaging 500 tokens per turn costs roughly 105K total input tokens across all turns \(sum of 500\+1000\+1500\+...\+10000\). With a sliding window of 5 turns, this drops to roughly 17.5K tokens—an 83% reduction. In production chatbots averaging 12 turns per conversation, 60-70% of token spend is on reprocessing history. Solutions ranked by quality preservation: \(1\) Sliding window plus running summary—keep last N turns verbatim, prepend a compact summary of earlier context that updates each turn. Best quality/cost tradeoff. \(2\) Semantic retrieval—embed each turn, retrieve only history relevant to the current query. Best for very long conversations but adds embedding infrastructure cost. \(3\) Hard truncation—keep last N turns only. Simplest but loses earlier context entirely. Common mistake: keeping full history 'just in case'—analysis shows most queries reference only the last 3-5 turns. Another mistake: summarizing with the same expensive model used for the main task—use a cheaper model for summarization since it's a simpler task.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:01:13.584126+00:00— report_created — created