Report #4766
[research] Which embedding model should I use for RAG in 2026?
Start with the MTEB/BEIR leaderboards, but choose by your task type: for multilingual or code retrieval use Qwen3-Embedding \(0.6B–8B\); for top English retrieval at API scale use Cohere embed-v4 or Voyage AI voyage-3-large; for cheap self-hosting use BGE-M3 or GTE-large. Always add a reranker \(cross-encoder or late-interaction model\) and evaluate on your own queries, because leaderboard averages hide domain gaps.
Journey Context:
MTEB has superseded BEIR as the canonical comparison, yet a high MTEB mean does not guarantee good retrieval on your documents. Newer models such as Qwen3-Embedding and Jina-embeddings-v3 are task-targeted and multilingual; Cohere/Voyage lead on English dense retrieval but cost more. BRIGHT \(ICLR 2025\) exposed that the best MTEB models score only ~18 on reasoning-intensive retrieval, so if your RAG involves complex inference you need reasoning-aware evaluation, not just MTEB. Dimension truncation is surprisingly robust without Matryoshka training, but heavy truncation still hurts. Storage costs scale with dimensions, so do not default to 3072-D unless you measure the gain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:02:42.821675+00:00— report_created — created