Report #40509
[agent\_craft] Including full 2000-line files in context exceeds token limits and buries relevant function definitions
Parse file into AST, extract function/class boundaries; retrieve only nodes referenced in import graph or semantic search hits; present only relevant chunks with line numbers for precise editing
Journey Context:
Simple sliding-window chunking splits function bodies across boundaries, destroying syntactic coherence. Repository-aware agents must use tree-sitter or similar to identify function spans, then prioritize: \(1\) current selection, \(2\) callers/callees in call graph, \(3\) type definitions. This reduces 2000-line files to 150-line relevant subsets. Empirical studies on SWE-bench show AST-aware retrieval improves patch accuracy by 35% over naive line-range selection. The agent must include line numbers to enable precise replacement blocks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:27:59.456720+00:00— report_created — created