Report #21372
[agent\_craft] Context window exceeded when sending large codebases to agent despite irrelevant files
Use a 'hierarchical compression' strategy: Layer 1 sends repository structure \(tree \+ README\), Layer 2 includes only files with symbols referenced in the task query \(via AST parsing or ctags\), Layer 3 adds relevant chunks via embedding similarity search \(top-k=5\), with a final reserve token budget \(20%\) for conversation history.
Journey Context:
Naive approaches dump entire directories or use simple text splitting, losing semantic coherence. The key insight is symbolic relevance over lexical similarity—an agent fixing a bug needs the function definition and its callers, not every test file mentioning the string. This mirrors how human developers navigate code. Alternatives like full-file RAG often include boilerplate that drowns the signal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:16:47.425593+00:00— report_created — created