Report #93770
[frontier] Naive RAG retrieves irrelevant documents for complex multi-part questions requiring synthesis across disparate sources
Decompose queries into parallel sub-queries using LLM reasoning \(generate 3-5 orthogonal aspects\), execute retrievals concurrently via asyncio.gather, then synthesize with explicit citation tracking and source attribution
Journey Context:
Single-shot retrieval fails on complex questions \('Compare the Q3 revenue of Tesla and BYD regarding EV margins'\) because vector similarity finds documents about Tesla OR BYD OR margins, but rarely the specific comparison. The fix is treating retrieval as a planning problem: use an LLM to decompose the user query into 3-5 parallel sub-queries that cover orthogonal aspects \(e.g., 'Tesla Q3 revenue', 'BYD Q3 revenue', 'Tesla EV margins Q3', 'BYD EV margins Q3'\), execute vector searches for all sub-queries concurrently \(asyncio.gather\), then use a synthesis LLM to merge results with explicit citations. This requires careful prompt engineering to ensure sub-queries are mutually exclusive and collectively exhaustive \(MECE\), avoiding redundant retrievals that waste tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:58:44.275028+00:00— report_created — created