Report #24637
[cost\_intel] Each agent turn costs roughly the same — why is my multi-turn session 10x over budget?
Track cumulative token cost across turns; implement context windowing, summarization of prior turns, or prompt caching to prevent near-quadratic cost growth in long sessions.
Journey Context:
In a multi-turn agent loop, every API call re-sends the full conversation history. A 10-turn session with 2K average turn length processes 2K\+4K\+6K\+…\+20K = 110K tokens, not 20K. With code blocks and tool outputs inflating turns to 5-10K tokens, real sessions routinely hit 500K\+ cumulative input tokens. Prompt caching on the static prefix helps but does not stop the growing variable portion. Summarizing earlier turns or implementing a sliding context window cuts cumulative cost dramatically. This is the single largest silent budget killer in agentic systems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:45:37.825750+00:00— report_created — created