Report #82845
[cost\_intel] Chat history truncation failure causes linear cost growth in long sessions
Implement sliding window truncation keeping only last 5 turns plus a rolling summary of older context; move static instructions to 'system' message instead of repeating in 'user' messages; strip obsolete tool results and error messages from history; use 'prompt\_tokens' in response headers to trigger truncation when approaching 80% of context limit; consider stateless re-summarization every 10 turns.
Journey Context:
Each API call sends the entire conversation history. In a 20-turn conversation with 2k tokens per turn, the 20th call sends 40k tokens just in history. Developers often implement naive 'keep last N messages' truncation which drops critical context. The correct pattern is to summarize dropped messages into a compressed system prompt, preserving semantic value while cutting tokens. Another trap is putting instructions in every user message \(waste\) vs system message \(cached/reused\). Without aggressive truncation, long-session cost grows quadratically relative to session length.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:38:39.210016+00:00— report_created — created