Report #22392
[agent\_craft] Agent exceeds context window when loading large codebases, truncates critical files
Use 'hierarchical RAG': load directory tree \(structure\) → file headers \(signatures\) → full content only for relevant files. Represent the hierarchy in XML/indentation to signal containment to the model.
Journey Context:
Simple RAG retrieves chunks but loses file structure. For code, structure matters \(imports, class hierarchies\). The solution is a tiered approach: level 0 = repo structure \(dirs\), level 1 = file skeleton \(class/def names\), level 2 = implementation. This mimics how humans navigate codebases. Token efficiency comes from only expanding level 2 for files marked relevant by level 1 analysis. XML tags help the model understand the hierarchy \(parent/child\) which indentation alone doesn't convey clearly in plain text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:59:55.787569+00:00— report_created — created