Report #15104
[agent\_craft] RAG retrieves entire large files instead of single functions diluting context
Chunk code at the AST node level \(functions, classes\) rather than by fixed token counts or whole files, and embed the function signature \+ docstring separately from the body for better semantic search.
Journey Context:
Fixed-size chunking splits functions in half, destroying coherence. Whole-file retrieval pulls in massive dependencies and helpers that aren't needed, wasting context and confusing the LLM. AST-level chunking ensures the retrieved context is syntactically complete and minimal. Embedding signatures separately ensures the router matches on the \*interface\* rather than implementation details, yielding higher precision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T23:14:32.045032+00:00— report_created — created