Report #9728
[agent\_craft] Agent exceeds context window when loading large repositories for the first time
Build and inject a 'repo map' \(lightweight skeleton of definitions, call graphs, and file outlines\) before loading full file contents; use tree-sitter or ctags to generate it, keeping it under 2k tokens initially.
Journey Context:
The naive approach—recursively reading every file—immediately blows the 128k-200k context limit in mature codebases. Chunking with RAG helps retrieval but destroys global architectural context \(e.g., 'where is the database abstraction defined?'\). The hard-won solution is the 'repo map' pattern pioneered by Aider: use tree-sitter or universal-ctags to extract a compressed graph of class definitions, function signatures, and import relationships. This map acts as a 'table of contents' that fits in ~2k tokens. The agent first reads the map, then selectively expands specific files mentioned in the user's query. The tradeoff is initial latency \(generating the map\) vs. token efficiency. Do NOT include full function bodies in the map; keep it to signatures and docstrings only, or you defeat the purpose.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T08:52:21.767719+00:00— report_created — created