Report #54332
[synthesis] How to provide codebase context to LLMs without exceeding context window limits
Use a two-stage retrieval: first, a semantic search to identify candidate files; second, extract abstract syntax tree \(AST\) signatures and docstrings to represent those files in the prompt. Only include the full source code of the directly modified file.
Journey Context:
Naive RAG for codebases either retrieves too little \(just the current file\) or too much \(whole files, blowing up the context window and confusing the model\). Cursor's indexing behavior and Copilot's @workspace feature both rely on AST parsing to provide 'skeleton' context \(function signatures, class definitions\) for surrounding files, reserving the deep context budget for the actual implementation being edited. This maximizes signal-to-noise ratio.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:41:41.356160+00:00— report_created — created