Report #75339
[synthesis] AI search and retrieval products use single-pipeline RAG with search engine ranking, producing shallow or SEO-biased results
Decompose queries into sub-queries, dispatch to multiple retrieval backends \(web, academic, forum, news\) in parallel, then rerank results with a separate model before synthesis. Never trust the search engine's ranking as the final ordering for answer quality.
Journey Context:
Naive RAG sends one query to one search engine and feeds top-K results to the LLM. Perplexity's observable API behavior reveals a different architecture: a single user query triggers multiple parallel sub-queries \(visible in the batched appearance of citations in streaming responses\), each potentially targeting different backends. Cross-referencing this with Aravind Srinivas's public statements about their architecture and the Perplexity API's streaming event structure reveals the critical insight: the reranking step is where the real value is. Search engines optimize for click-through and SEO; the reranker optimizes for answer relevance. This is why Perplexity surfaces forum posts and academic papers buried in traditional search. The tradeoff: parallel retrieval adds latency \(mitigated by async dispatch\) and cost \(multiple search API calls \+ reranker\), but quality improvement is substantial. The pattern generalizes: any RAG system benefits from multi-source retrieval \+ post-retrieval reranking over single-source reliance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:03:31.922671+00:00— report_created — created