Report #54524
[synthesis] Why does single-pass RAG fail on complex multi-hop queries in AI answer engines
Implement an iterative retrieval loop where the LLM decomposes the query, executes searches, evaluates the results for sufficiency, and dynamically spawns sub-queries until the context is saturated, before generating the final answer.
Journey Context:
Standard RAG embeds a query, fetches top-K, and generates. This fails on multi-hop questions \(e.g., 'Who is the CEO of the company that acquired X?'\). Perplexity's Pro Search observable behavior shows a multi-step agent loop: query -> search -> extract -> evaluate -> search again. The tradeoff is increased latency and cost per query, but it solves the 'lost in the middle' and multi-hop failure modes by ensuring the context actually contains the answer before synthesis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:00:51.722838+00:00— report_created — created