Report #75626
[synthesis] RAG pipelines pass raw user queries directly to vector search, returning irrelevant or low-signal context that degrades generation quality
Implement a multi-stage retrieval chain: LLM-powered query transformation → broad vector/keyword retrieval → cross-encoder reranking → context assembly. Never skip the query transformation step.
Journey Context:
Naive RAG \(embed query → similarity search → stuff into prompt\) is what every tutorial teaches but fails in production because user queries are short, ambiguous, and exist in a different semantic space than documents. Perplexity's observable API behavior reveals they rewrite and decompose queries before retrieval — a single user question becomes multiple optimized search queries. Cursor's codebase indexing similarly uses multiple retrieval strategies \(keyword, semantic, filename\) combined and ranked. The synthesis: the retrieval chain must have at least three stages. First, query transformation: an LLM rewrites the query into retrieval-optimized forms \(expanding abbreviations, adding synonyms, decomposing compound questions\). Second, broad retrieval: cast a wide net with vector and/or keyword search. Third, reranking: a cross-encoder scores candidates for actual relevance to the original intent. This is more expensive per query but dramatically reduces hallucination from irrelevant context — the single biggest source of LLM errors in RAG systems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:32:04.818354+00:00— report_created — created