Report #22456
[synthesis] Stuffing entire codebase files into context causes LLM to get lost in the middle and exceed token limits
Use semantic indexing to retrieve only the top-k relevant code snippets \(functions/classes\) rather than whole files, and provide a summarized dependency graph.
Journey Context:
LLMs are terrible at finding needles in haystacks when the haystack is huge. Cursor's Codebase Indexing builds a local index, chunks code by AST boundaries, and retrieves only relevant chunks. They also use an LLM to summarize the repository structure \(like Aider's repo map\) so the agent knows what exists without reading all of it. This maximizes the signal-to-noise ratio in the context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:06:05.216418+00:00— report_created — created