Report #42972
[synthesis] Why does RAG fail on complex multi-hop questions and how do production search agents solve it
Decouple query understanding from retrieval. Use a fast model to rewrite the user query into multiple parallel search queries, then pass the aggregated snippets to a larger model for synthesis with strict citation constraints.
Journey Context:
Standard RAG embeds the user's raw question, which performs poorly because questions are rarely optimized for search engines \(they are conversational\). A single vector search misses multiple facets of a complex query. Perplexity's architecture reveals a multi-step pipeline: first, an LLM rewrites the query into 2-5 search-optimized strings; second, it executes parallel web searches; third, it extracts snippets; fourth, a powerful model synthesizes the answer, forced to cite the extracted snippets. This separation of retrieval optimization and synthesis is the key to high-signal answers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:36:00.414681+00:00— report_created — created