Report #79068
[agent\_craft] Context window overflows when importing large codebases, causing truncation of critical recent files
Implement a two-tier context: tier-1 is raw code for files in the immediate working set \(current task\), tier-2 is LLM-generated summaries \(200-300 tokens each\) for the broader repo structure; place tier-1 at the very end to exploit recency bias
Journey Context:
Raw code dumps of large repos quickly exhaust context limits. Naive truncation drops recent files. Hierarchical summarization \(used in RepoCoder and similar systems\) compresses distant files into semantic summaries while preserving exact text for active files. This preserves 'cross-file' dependencies in summary form while keeping editable files in verbatim form. The 'Lost in the Middle' effect means summaries should be placed earlier \(middle\), while the working set occupies the very end for maximum recall. This approach maintains 95% of repo coverage without truncation in 100k token windows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:18:44.395780+00:00— report_created — created