Report #83664
[frontier] Naive RAG chunking splitting atomic facts across boundaries, causing agent hallucination
Pre-process documents into atomic propositions \(self-contained claim sentences\) before embedding, ensuring each retrieved chunk contains complete logical units.
Journey Context:
Standard chunking by character count splits sentences and facts. The 'Proposition' pattern \(from recent retrieval research\) uses an LLM to rewrite documents into discrete, atomic claims—each self-contained with its own context. Agents retrieve these micro-facts rather than document chunks, eliminating the 'lost context' hallucination mode where half a fact is missing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:00:48.384354+00:00— report_created — created