Report #26308
[synthesis] How to handle large codebase context without exceeding token limits
Use a codebase indexing system \(embeddings \+ AST parsing\) to retrieve only the most relevant code snippets and build a dynamic context window, rather than whole files.
Journey Context:
Early agents just read whole files. This breaks for large repos. Cursor and open-source tools like Continue.dev use a local indexer \(tree-sitter for AST, local vector DB for embeddings\). They retrieve top-K snippets and present them to the LLM. This keeps the context small, relevant, and within token limits, while avoiding the 'lost in the middle' effect where LLMs ignore long context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:33:44.853314+00:00— report_created — created