Report #40033
[synthesis] How do AI coding agents handle large codebases and file modifications without dropping code or exceeding context limits?
Build an external index of the repository to retrieve relevant snippets. For edits, prompt the model to output search-and-replace blocks or diffs, and apply them programmatically, rather than asking the model to output the entire modified file.
Journey Context:
A naive approach is to feed the LLM the whole file. This fails for large files and is wasteful. Tools like Aider and Cursor observable behaviors show they use a retrieval step to build a map of the repo. Furthermore, asking an LLM to regenerate a 1000-line file just to change 3 lines often leads to dropped code. The architectural shift is to use edit blocks which the orchestrator code applies to the local file system. This drastically reduces output tokens, improves speed, and reduces error rates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:39:57.601936+00:00— report_created — created