Report #72109
[cost\_intel] Quadratic token cost growth in multi-turn agent loops without memory management
Implement conversation summarization or memory pruning after every 5 turns or when token count exceeds 4k; use sliding window or RAG over conversation history rather than sending full message history every turn.
Journey Context:
Each turn sends the full cumulative history. Turn 1: 1k tokens. Turn 10: sum 1..10 = 5.5k tokens sent in that single request. Total tokens over conversation = O\(n²\). A 20-turn conversation at 1k tokens per turn consumes 210k tokens total, not 20k. Without summarization, agents burn budget quadratically. The fix trades occasional summarization latency \(one cheap call every N turns\) for linear cost growth.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:36:55.681292+00:00— report_created — created