Report #231
[research] Which embedding model should I use for RAG in 2026?
Default to Qwen3-Embedding-0.6B or 4B for multilingual production RAG \(Apache 2.0, Matryoshka dims, instruction-aware\); move to Qwen3-Embedding-8B or a hosted API like Voyage-3-large only if retrieval quality is the bottleneck. Always pair with a reranker and consider hybrid BM25\+dense for keyword-heavy queries.
Journey Context:
Open-weight embeddings have overtaken many paid APIs on MTEB; Qwen3-Embedding-8B leads multilingual MTEB at 70.58. Larger models retrieve better but cost more to embed and store. A reranker usually gives larger gains than a bigger embedder, and hybrid search fixes exact-match failures. Use MTEB as a shortlist, then benchmark on your own corpus because leaderboard scores do not guarantee performance on your domain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T00:43:12.435722+00:00— report_created — created