Report #47442
[synthesis] Single-stage embedding retrieval provides sufficient context quality for production AI products
Implement mandatory two-stage retrieval: broad embedding/keyword search for recall \(top-K where K=50-100\), then a cross-encoder or LLM-based reranker for precision \(top-N where N=5-10\)
Journey Context:
Embedding similarity captures semantic relatedness but conflates topic similarity with task relevance. A document about 'React hooks' is embedding-similar to a query about 'React hooks' regardless of whether it answers the specific question. Every production AI product that relies on retrieval adds a reranking stage: Perplexity's quality advantage comes from reranking, not retrieval breadth; Cursor's codebase indexing does embedding search but the agent loop applies a second relevance filter; Cohere built an entire reranking API because of this gap. The cost of reranking \(latency \+ compute\) is justified because feeding 5 highly relevant chunks to a model outperforms feeding 50 loosely relevant chunks—irrelevant context wastes token budget and actively misleads the model into generating plausible but wrong answers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:06:43.810536+00:00— report_created — created