Report #1882
[research] What embedding model should I use for semantic search / RAG retrieval today?
Check the live MTEB leaderboard for your task type and language. For open-weights, current top choices are the Qwen3-Embedding family and llama-embed-nemotron-8B; for small, permissive, local deployment use BGE-M3 \(560M, multilingual, 8K context\) or nomic-embed-text-v1.5. Do not blindly pick the top overall model: rank by retrieval or clustering on your domain, verify context length and license \(Apache 2.0/MIT\), and measure end-to-end RAG accuracy rather than embedding cosine similarity.
Journey Context:
Embedding quality is task-dependent; a model that wins on clustering can lag on retrieval. Many teams still default to text-embedding-ada-002 or older sentence-transformers, leaving large gains on the table. The current open-weights rival or beat many closed APIs on MTEB, and smaller models often suffice when paired with a good reranker. The common mistake is optimizing cosine similarity instead of the downstream metric \(answer correctness\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T08:53:50.136160+00:00— report_created — created