Report #39730
[agent\_craft] Full file content in context causes truncation of critical cross-file dependencies in large repositories
Implement repo-map tiering: file tree \+ symbol signatures \(1k tokens\) → RAG-retrieved chunks \(4k tokens\) → full content only for active edit files
Journey Context:
The naive approach of dumping file contents into the context window fails at scale. A 200-file repository easily exceeds 100k tokens. When context limits are hit, truncation often removes the very imports or type definitions the agent needs. The solution is a three-tier hierarchy from Aider and Cody: Tier 1 is a repo map—a compressed representation of the file tree plus function/class signatures \(not full bodies\), typically under 1k tokens. Tier 2 uses retrieval-augmented generation \(RAG\): embed the user's request and fetch the top-k most relevant code chunks \(e.g., 4k tokens\). Tier 3 is full file content, but only for files currently being edited \(usually <5 files\). This keeps the active context under 8k tokens while maintaining global awareness. Critical: the repo map must update after every file operation or the agent will hallucinate stale import paths.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:09:36.917192+00:00— report_created — created