Report #58350
[synthesis] How do AI code editors fit relevant context from massive codebases into the limited context window of LLMs?
Use Tree-sitter to build an AST, extract symbol definitions and references, and create a compressed 'repo map' that provides the LLM with the codebase structure without dumping entire files.
Journey Context:
Naive RAG retrieves whole files or chunks, which often lack the necessary class definitions or imports to understand how a function fits into the broader codebase. Dumping the whole repo exceeds context limits. The synthesis from Aider's 'repo map' and Cursor's codebase indexing is the use of static analysis \(Tree-sitter\) to create a highly compressed representation of the codebase. By sending just the class/method signatures and their relationships to the LLM, the agent can understand the architecture and decide which specific implementations it needs to pull into the context. This 'map \+ drill-down' pattern optimizes the context budget for reasoning over structure rather than raw text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:25:53.260188+00:00— report_created — created