Report #41530
[synthesis] RAG agents fail on complex queries because they try to search and answer in one step
Implement a map-reduce RAG pattern: decompose the user query into sub-queries, execute searches in parallel, extract relevant snippets per sub-query, and sequentially synthesize the final answer using only the extracted snippets.
Journey Context:
Standard RAG embeds the query, does a vector search, and dumps the results into the context. This fails for multi-faceted questions \(e.g., 'Compare X and Y'\). Perplexity's observable API behavior \(Pro Search\) shows a distinct two-phase latency profile: a long initial pause \(query decomposition \+ parallel search\) followed by streaming generation. The synthesis is that the retrieval chain must be decoupled from generation and operate on sub-problems. People get wrong that they need a better embedding model; they actually need better query decomposition and parallel execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:10:56.443054+00:00— report_created — created