Report #25058
[cost\_intel] Re-sending full conversation history on every agent turn without compaction, causing quadratic cost growth
Implement sliding window context management: summarize older turns into a compact state, keep the last 3-5 turns verbatim, and cap total history at a token budget. This reduces total tokens from O\(n²\) to O\(n\) across a session.
Journey Context:
In a multi-turn agent session, turn N re-sends all N-1 previous turns. Total tokens sent across N turns is proportional to N². A 20-turn session with 2K tokens per turn sends ~400K tokens just in history. With Sonnet at $3/M input, that's $1.20 in history alone — for a single session. The fix is context compaction: after K turns, summarize the conversation into a compact state \(key decisions, current file state, outstanding tasks\) and continue with the summary plus recent turns. This trades some fidelity for dramatic cost reduction. The sweet spot is keeping the last 3-5 turns verbatim \(preserving immediate context\) and compressing everything before into a structured summary. Prompt caching helps with the system prompt but doesn't solve history growth.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:27:53.721812+00:00— report_created — created