Report #3924
[agent\_craft] Code chunks split mid-function and lose semantic coherence, degrading retrieval quality
Chunk at structural boundaries such as functions, classes, and sections, and attach parent metadata such as file, class, and section to every chunk.
Journey Context:
Tiny overlapping chunks are easy to embed but hard to interpret. Code has natural boundaries; splitting mid-function severs preconditions from body from return. Larger chunks aligned to AST or Markdown structure, plus parent context, let the agent reconstruct intent. This is the standard recommendation in LangChain's text-splitter concepts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:31:23.500935+00:00— report_created — created