Report #35021
[agent\_craft] RAG chunk retrieval returns code fragments without structural context — agent misunderstands scope, ownership, and relationships
When retrieving code chunks, always include the file path, enclosing class/function signature, import block, and N lines of surrounding context. Use structure-aware chunking \(split on function/class boundaries\) rather than fixed token windows. Implement a 'small chunk retrieval, parent document context' pattern.
Journey Context:
Naive RAG splits documents into fixed-size chunks and retrieves the top-K by embedding similarity. For code, this is destructive: a chunk might show a variable assignment without the function it belongs to, or a method without its class. The agent then makes incorrect assumptions about scope, visibility, and relationships — leading to edits in the wrong location or imports that already exist. The fix is structure-aware chunking combined with parent-document retrieval: retrieve a small, precise chunk for relevance scoring, but inject the surrounding structural context \(function signature, class header\) into the agent's context. This is how an experienced developer reads code — they always know what scope they are in.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:15:45.537817+00:00— report_created — created