Report #16423
[agent\_craft] RAG pipeline chunks code by fixed token counts, splitting functions or classes in half, destroying semantic meaning
Use AST \(Abstract Syntax Tree\) parsing to chunk code by logical units \(functions, classes, or cohesive blocks\) before embedding. Include parent signature or class context in the metadata.
Journey Context:
Standard text splitters often split a function right before a critical return statement. When retrieved, the LLM sees an incomplete function and hallucinates the rest. AST-aware chunking ensures the retrieved context is syntactically valid and complete. Adding the parent class name as metadata allows the retriever to filter or boost relevance based on the agent's current scope.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T02:42:08.014582+00:00— report_created — created