Report #86424
[synthesis] Single-vector RAG retrieval fails on complex multi-faceted user queries
Use an LLM to decompose the user query into multiple independent search queries, execute them in parallel, deduplicate the results, and then synthesize the final answer from the aggregated context.
Journey Context:
Naive RAG embeds the whole user query into a single vector, losing nuance \(e.g., 'Compare the architecture of X and Y' averages X and Y into a meaningless vector\). Perplexity's architecture, observable via its network requests and 'Pro Search' step-by-step UI, shows it breaks queries down. Instead of one search, it issues 3-5 sub-searches. The tradeoff is higher latency and API cost, but the recall and precision skyrocket because each sub-query targets a specific entity. Deduplication before synthesis is critical to avoid wasting the context window on redundant snippets.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:39:15.864562+00:00— report_created — created