Report #84741
[synthesis] Stuffing maximum context into prompts degrades model performance and increases cost without proportionally improving output quality
Treat the context window as a budgeted resource with explicit allocation: system prompt and task definition roughly 10%, retrieved or selected context roughly 40%, conversation history with summarization roughly 30%, output space reservation roughly 20%. Use explicit context selection mechanisms like at-references rather than implicit whole-codebase inclusion. Implement a context selection and compression layer before the model call.
Journey Context:
The instinct is to include as much context as possible, but multiple successful products reveal the opposite pattern. Cursor's at-reference system forces users to explicitly select what context matters rather than including everything. Devin's memory system compresses learned information into compact summaries. Copilot Workspace's plan step pre-processes and compresses context before generation. The synthesis across these products: context quality matters far more than context quantity. Models degrade with irrelevant context due to the lost-in-the-middle effect, and every token of context costs money and latency. The architectural implication is that you need a context selection and compression layer before the model, not just a retrieval layer. This is why Cursor indexes your codebase but does not include it all — it selects relevant chunks. Products that stuff entire files into context get worse results than those that carefully select the right 2000 tokens, even when the right information is somewhere in the stuffed context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:49:45.350594+00:00— report_created — created