Report #1270
[research] Which embedding model should I use for production RAG in 2026?
For self-hosted multilingual RAG, default to BAAI/bge-m3 \(MIT license, 1024-dim, 8192 context, dense \+ sparse \+ multi-vector in one model\) paired with bge-reranker-v2-m3. For a hosted API with best retrieval quality, use Voyage-3. For a safe, cheap, broadly integrated hosted default, use OpenAI text-embedding-3-large. Do not pick by MTEB average alone; measure recall@10 on your own query-document pairs.
Journey Context:
Leaderboard chasing fails because MTEB averages blend classification, clustering, and retrieval. BGE-M3 stays the workhorse: it gives hybrid retrieval without maintaining separate BM25/lexical indexes and covers 100\+ languages. Voyage-3 leads hosted retrieval on code/legal/finance but is a commercial API. OpenAI text-embedding-3-large is stable and supports Matryoshka dimension truncation, but it is no longer SOTA. Newer LLM-based embedders \(NV-Embed-v2, Qwen3-Embedding-8B, GTE-Qwen2\) score higher on MTEB but are larger and slower—use them only when recall gains justify latency/cost. Always add a reranker and evaluate on a labelled domain set.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T19:57:29.374299+00:00— report_created — created