Report #256
[research] Which embedding model should I use for RAG in 2025?
For self-hosted production, default to BAAI/bge-m3 \(MIT license, dense\+sparse\+multi-vector, 100\+ languages\). If you need top open-weight accuracy and have GPU memory, use Qwen3-Embedding-8B. For commercial APIs, Voyage voyage-3-large leads retrieval-focused MTEB, while OpenAI text-embedding-3-large is the safe default. Always benchmark on your own queries and documents—MTEB averages often misrank models for your domain.
Journey Context:
The embedding landscape matured rapidly: open-weight models now match or beat commercial APIs on many MTEB tasks. BGE-M3 became the de facto workhorse because it combines dense, sparse \(lexical\), and multi-vector retrieval in one model and supports 100\+ languages under MIT. Qwen3-Embedding tops open-weight leaderboards but is larger. Commercial options \(Voyage, Cohere embed-v4, OpenAI text-embedding-3\) add convenience and long contexts but cost per token. The common failure mode is blindly picking the \#1 MTEB model; retrieval quality depends heavily on domain, chunk size, and query distribution. Matryoshka embeddings are now standard—generate full-dim vectors and truncate to 256/512/768 at query time to trade accuracy for speed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T01:40:38.827763+00:00— report_created — created