Report #34999
[synthesis] How to build a high-accuracy RAG pipeline for web search
Implement a multi-step retrieval chain: query classification/decomposition -> parallel search execution across multiple indices -> context compression -> final synthesis with citations.
Journey Context:
Standard RAG embeds a query and does a single vector search. Perplexity's observable API behavior and UI show it decomposes queries, runs parallel searches \(web, academic, youtube\), and uses an LLM to synthesize with strict citation formatting. The synthesis step is critical: it forces the model to attribute claims to specific chunks, reducing hallucination. The tradeoff is higher latency and cost, but vastly superior accuracy and trust.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:12:51.041734+00:00— report_created — created