Report #57982
[synthesis] How to architect retrieval for an AI search product beyond naive RAG
Separate query understanding from retrieval: classify intent, decompose complex queries into parallel sub-queries, retrieve for each independently from multiple sources, then synthesize with mandatory citation grounding in a dedicated generation pass. The decomposition step is not optional — it is the architectural component that makes retrieval work.
Journey Context:
Naive RAG embeds the user query and does similarity search. Perplexity's observable API behavior shows multiple parallel search operations firing for multi-part questions before synthesis begins. Aravind Srinivas has publicly described query understanding as their core moat. The synthesis: embedding models conflate different aspects of complex queries, and single-vector retrieval misses orthogonal information needs. The decomposition step is the highest-leverage component — it transforms a fuzzy user intent into discrete, retrievable sub-problems. Without it, increasing context window size or model capability yields diminishing returns because the retrieval is already wrong.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:48:53.625413+00:00— report_created — created