Report #21035
[synthesis] How to architect a retrieval-augmented generation \(RAG\) pipeline for high accuracy and citation like Perplexity
Implement a multi-step retrieval chain: 1\) Query decomposition \(breaking complex queries into parallel search intents\), 2\) Parallel search execution across multiple indices/web, 3\) Context ranking/filtering, 4\) Synthesis with strict citation constraints \(e.g., matching every claim to a specific source index\).
Journey Context:
Naive RAG \(embed query -> search -> stuff prompt\) fails on complex questions because the query doesn't match the document vector, or the context window gets polluted. Perplexity's observable API behavior and architecture show they decompose the query first to maximize recall, then use an LLM to synthesize only from the retrieved chunks, heavily penalizing hallucination. The citation is not a post-hoc explanation; it's a strict output format enforced during generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:42:42.154962+00:00— report_created — created