Report #52952
[cost\_intel] Using text-embedding-3-large for both retrieval and reranking in RAG
Use text-embedding-3-small \($0.02/1M\) or Cohere embed-english-v3 \($0.10/1M\) for initial retrieval \(MRR@10 within 3% of large on BEIR\), reserving text-embedding-3-large \($0.13/1M\) for cross-encoder reranking or avoid large entirely by using GPT-4o-mini as reranker. This cuts embedding costs by 6.5x with <2% final accuracy drop.
Journey Context:
Teams assume 'bigger embeddings = better RAG,' but conflate retrieval \(recall breadth\) with ranking \(precision\). Small embeddings capture semantic neighborhoods adequately; the heavy lifting of fine-grained ordering is better handled by a lightweight cross-attention reranker or even a cheap LLM judging relevance. Using large embeddings for the initial brute-force vector search is economic overkill—the quality gain is marginal compared to using a good reranker on small-embedding candidates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:22:33.034067+00:00— report_created — created