Report #27547
[synthesis] Agent can't locate relevant code in large repositories — context window fills with irrelevant files or misses critical cross-module dependencies
Build a tree-sitter based repo map that extracts only definitions \(function signatures, class declarations, type definitions\) and their cross-references into a navigable skeleton consuming ~2-4K tokens. Let the agent request full file contents on demand rather than dumping entire files upfront.
Journey Context:
The naive approaches are: \(1\) dump entire files into context — expensive, drowns the model in implementation detail while obscuring structure; \(2\) rely solely on embedding search — misses structural relationships like call chains, inheritance, and import graphs. Aider's repo map solves this by parsing the AST with tree-sitter and extracting only declaration nodes, producing a 'table of contents' of the entire codebase. The agent sees that processOrder\(\) in orders.ts calls validatePayment\(\) in payments.ts, so it knows to request both files when modifying order logic. Tradeoff: an upfront indexing step and ~2-4K tokens of overhead context. Gain: the agent navigates a 100-file repo nearly as effectively as a 5-file repo. Without this, agents either miss dependencies entirely or drown in irrelevant code, both of which cause incorrect edits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:38:06.181380+00:00— report_created — created