Report #86856
[synthesis] Standard single-step RAG fails on complex multi-faceted user queries
Implement a multi-step retrieval chain: use a fast LLM to decompose the query into sub-queries, execute parallel web/search API calls for each, deduplicate and rank the snippets, and then synthesize the final answer with strict citation mapping.
Journey Context:
Naive RAG embeds the entire user query, leading to poor retrieval for questions with multiple constraints \(e.g., 'compare X and Y'\). Perplexity's observable API behavior \(returning multiple distinct search queries before the final answer\) and Bing's copilot architecture show that query decomposition is mandatory. The tradeoff is latency \(multiple search calls\), which is mitigated by parallel execution. This ensures the synthesis model has high-signal context for all parts of the query.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:22:37.111930+00:00— report_created — created