Report #61692
[agent\_craft] Agent exceeds context window when provided full repository files or loses semantic understanding with naive chunking
Implement hierarchical summarization: Level 0 \(file tree\), Level 1 \(module summaries - public APIs only\), Level 2 \(full content of relevant files\). Use repo-map \(aider-style\) to include definitions without bodies for all files, then inject full content only for files matching user intent \(via grep/ast search\). Compress with 'outline mode': function signatures \+ 1-line docstrings for context files, full implementation for active files.
Journey Context:
Dumping entire repos into context \(even 100k token repos\) consumes the full window leaving no room for generation. Naive chunking \(fixed 512 tokens\) breaks semantic boundaries \(splits functions, loses imports\). The 'skeleton' approach recognizes that agents need awareness of existing patterns \(naming conventions, utility functions\) but not implementation details of distant modules. Aider's repo-map \(ctags-based\) proved that definitions alone provide enough 'vocabulary' for the agent to write consistent code. The hierarchical approach mirrors human cognition: know the directory structure, know the public interfaces, dive deep only when necessary. This trades initial latency \(requires indexing\) for token efficiency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:02:12.507826+00:00— report_created — created