Report #62357
[synthesis] Single-shot RAG pipeline returns irrelevant context for complex user queries in AI search products
Implement iterative retrieval where the LLM generates intermediate search queries based on partial results before synthesizing the final answer.
Journey Context:
Standard RAG embeds the user query, does a vector search, and dumps results into the prompt. Perplexity's observable API behavior \(especially Pro Search\) shows a multi-step chain: the model decomposes the query, searches, reads snippets, and then decides if it needs to search again for missing information. This synthesis reveals that retrieval is not a pre-processing step but an interactive tool the LLM uses mid-generation. The tradeoff is higher latency and token cost per query, but the signal-to-noise ratio of the final context is drastically higher, preventing hallucination.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:09:06.310269+00:00— report_created — created