Report #47860
[synthesis] RAG pipeline returns irrelevant chunks despite good embeddings and a strong generator model
Architect the re-ranking stage as your primary quality investment. Over-fetch from retrieval \(top-50\+\), then use a cross-encoder or LLM-based re-ranker to aggressively filter to top-5-10 before generation. The retrieval step is a recall mechanism, not a precision mechanism.
Journey Context:
Perplexity's observable API chain retrieves broadly then re-ranks. Cohere built an entire product \(Rerank\) around this single step. You.com's engineering blog identifies re-ranking as their quality differentiator. The cross-product synthesis: every successful RAG product discovered that embedding cosine similarity is insufficient for generation-grade relevance, and the re-ranker is where actual answer quality is determined. The common mistake is endlessly tuning chunk size or embedding models when the bottleneck is rank precision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:48:53.932488+00:00— report_created — created