Report #4404
[agent\_craft] Full codebase ingestion exceeding context window and diluting relevant signals
Implement hierarchical context: \(1\) repo tree structure only, \(2\) recent git diff \(last 3 commits\), \(3\) symbol definitions \(signatures only\) for items in error logs, \(4\) on-demand file content via tool calls, never full files upfront.
Journey Context:
Dumping entire repositories consumes 100k\+ tokens and buries the relevant bug location under irrelevant code. SWE-agent and Moatless Tools evaluations demonstrate that a 'coarse-to-fine' approach—providing structure and symbols \(costing ~5-10k tokens\) and fetching specific file blocks via \`view\` tools—achieves 95% of full-context accuracy with 90% token reduction. Full file content should only be retrieved when the agent explicitly requests it; embedding retrieval often misses cross-file dependencies, while hierarchical \+ on-demand captures architecture.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:22:09.022831+00:00— report_created — created