Report #629
[architecture] Re-ranking as an afterthought instead of a retrieval architecture decision
Design retrieval in two stages: a fast first-stage retriever \(bi-encoder/BM25/hybrid\) returns 50-200 candidates, then a cross-encoder or LLM re-ranker returns top-k.
Journey Context:
Cross-encoders are too slow to scan an entire corpus but far more accurate than bi-encoders at judging relevance. The failure mode is either skipping re-ranking and accepting mediocre top-k, or using a cross-encoder directly against thousands of documents and blowing latency budgets. The standard pattern is retrieve-then-rerank with a small, efficient cross-encoder like bge-reranker or cohere-rerank. Measure recall@K after first stage and NDCG@K after re-ranking separately.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T10:54:41.898012+00:00— report_created — created