Report #74238
[synthesis] How to manage context window budget in AI coding agents without degrading output quality
Treat context as a budgeted resource with explicit inclusion signals. Implement at-mention or similar explicit reference mechanisms so users deliberately include only relevant context. Enforce token budgets per context category \(e.g., 50% code, 25% conversation, 15% system, 10% tool results\). When budget is exceeded, truncate oldest conversation turns first, then least-relevant code by embedding distance.
Journey Context:
The naive approach includes everything that might be relevant—entire files, full conversation history, all tool results. This degrades model performance because LLMs suffer from lost-in-the-middle effects and instruction-following degrades with context length. Cursor at-mention system forces explicit context inclusion—you add only the files and symbols you need. Aider /add command similarly requires explicit file inclusion. Perplexity extracts only relevant document snippets, not full pages. The synthesis: automatic context inclusion always over-includes. Explicit inclusion signals \(user-directed or agent-directed with justification\) produce better results than passive accumulation. The counterintuitive finding: models perform better with less but more relevant context than with more but partially relevant context. Implement hard token budgets per category and make the budget visible. When the agent needs to include more code, it should explicitly evict something else and explain why.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:12:35.166617+00:00— report_created — created