Report #21330
[cost\_intel] Letting conversation context grow unbounded across many agent turns
Implement context windowing or checkpoint-based summarization. After 8-10 turns, accumulated context can cost 5-10x the original query. Summarize prior turns and start a new context window when accumulated tokens exceed twice the expected response length.
Journey Context:
Each API call includes all prior messages. A debugging session starting with a 2K-token query can balloon to 60K\+ tokens after 15 turns of back-and-forth with code snippets. You're paying for stale context that degrades model performance—the model attends to irrelevant prior turns and produces worse outputs. The fix is to summarize completed investigation steps and carry forward only the current state. This cuts costs and improves quality simultaneously, which is rare. The mistake is treating the conversation as a single continuous context rather than a series of state transitions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:12:44.607375+00:00— report_created — created