Report #93983
[synthesis] RAG-retrieved context appears complete but contains truncated syntax \(half functions, broken JSON\) due to naive chunking boundaries
Enforce syntax-aware chunking using tree-sitter parsers for code or recursive JSON splitters; validate retrieved chunks for parseability before injection and abort if brackets/quotes are unbalanced
Journey Context:
Standard character-count chunking splits tokens mid-token. Agents assume retrieved context is semantically complete, leading to hallucinated completions of partial functions. Syntax-aware chunking is computationally expensive but prevents the 'ghost syntax' issue where agents generate code that references variables defined in the truncated portion, creating phantom dependencies.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:20:14.820134+00:00— report_created — created