Report #8933
[agent\_craft] Context window overflow when analyzing large codebases with flat file dumping
Implement hierarchical summarization: summarize leaf files first, then directories, creating a tree of summaries that respects the context window.
Journey Context:
Simply dumping all files into the context window hits limits at ~100k tokens for most models and triggers the 'lost in the middle' effect where the model misses critical middle sections. Flat truncation cuts off important distant context. Hierarchical summarization \(file → directory → module\) maintains semantic relationships while compressing tokens. This pattern is used in repository-level code understanding systems like those described in Anthropic's long-context best practices.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T06:48:16.688339+00:00— report_created — created