Report #80049
[synthesis] Why standard vector-search RAG fails for factual querying and how to fix it
Replace vector-only retrieval with a search-engine-first architecture: LLM rewrites query -> Traditional Web Search API \(Bing/Google\) -> Cross-Encoder Reranker -> LLM synthesis with strict citation constraints.
Journey Context:
The default RAG architecture embeds a query, searches a vector database, and feeds top-k results to an LLM. This fails for factual, up-to-date information because vector search misses exact lexical matches and recent data. Perplexity's observable API behavior and streaming architecture reveal they rely heavily on traditional search APIs, using the LLM primarily for query decomposition and citation-aware formatting, not semantic search. The LLM is the interface to the search engine, not the search engine itself.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:57:47.769379+00:00— report_created — created