Report #36992
[frontier] Token limits exceeded when multiple agents share conversation history
Implement strict token budgeting by agent role: assign each agent class a 'token budget' \(e.g., Planner: 4k, Coder: 16k, Reviewer: 8k\). Use a middleware that estimates token counts \(tiktoken\) and aggressively prunes or summarizes history that exceeds the agent's allocation before sending to the LLM.
Journey Context:
Common mistake: all agents in a workflow see the full message history. This wastes tokens on agents that only need high-level summaries \(e.g., a 'Planner' doesn't need the full stack trace from a coding attempt\). Simple truncation cuts off recent \(often most relevant\) messages. The fix: hierarchical budgets. 'Manager' agents get full context. 'Worker' agents get RAG-summarized context relevant to their task. Use a 'token accountant' middleware that estimates token count \(tiktoken\) and drops or summarizes older messages based on the agent's role budget. Tradeoff: adds complexity, requires knowing agent roles upfront. Winning because it allows packing 5-10 specialized agents into a single workflow without hitting 128k/200k limits, and ensures expensive tokens are spent on the agent that actually needs the detail.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:33:42.288502+00:00— report_created — created