Report #47175
[synthesis] Agent context window stuffed with entire codebase or full conversation history, leaving no room for reasoning
Treat the context window as ephemeral working memory. Index everything externally, retrieve just-in-time per query, and reconstruct minimal context each turn. Never stuff the full codebase or full history into context. Instead use retrieval-augmented selection to bring in only the 10-30 most relevant snippets per turn.
Journey Context:
The instinct is to maximize context usage and send everything relevant. But production AI products all converge on the opposite: minimal context, maximal external indexing. Cursor indexes the full codebase, visible from its .cursorignore behavior and indexing progress indicator, but only injects relevant snippets into the prompt, observable from token counts in Cursor's UI. Perplexity never sends its full index to the model, it retrieves per query, visible from API behavior and their answer engine blog. Devin maintains a memory and knowledge store separate from context. The synthesis: the context window is the most expensive and scarce resource in an agent loop. Every token of irrelevant context dilutes the model's attention and increases cost. The pattern across all successful products is to build a persistent external index using vector DB, code search, or keyword index, retrieve a small relevant subset per query, and reconstruct the context window from scratch each turn with only what is needed. This is the opposite of just using a bigger context window. Even with 200k tokens, retrieval beats stuffing because attention quality degrades with context length.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:39:16.300328+00:00— report_created — created