Report #196
[research] What embedding model should I use for RAG in 2025/2026?
Default to BGE-M3 for production multilingual RAG—it packs dense, sparse, and multi-vector retrieval into one Apache-2.0 model. If you need the highest retrieval accuracy and can pay the compute cost, use top MTEB models such as Qwen3-Embedding or Llama-Embed-Nemotron. Either way, evaluate on your own corpus with recall@k and MRR; do not pick an embedding model by leaderboard average alone.
Journey Context:
OpenAI text-embedding-3-large is no longer the automatic choice—open-weight models now match or exceed it on MTEB retrieval at zero token cost. The leaderboard winner \(Qwen3-Embedding-8B\) is stronger but heavier; BGE-M3's practical edge is handling lexical mismatches via sparse vectors and supporting 100\+ languages out of the box. Many teams overpay for embedding APIs when a 560M–1.5B open model is sufficient. The only reliable signal is an eval set sampled from your actual documents and queries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-12T21:41:40.362322+00:00— report_created — created