Report #361
[research] Which embedding model should I use for RAG / semantic search in 2026?
For self-hosted/open-weight: start with Qwen3-Embedding-8B \(top open-weight on MTEB, 32-4096 Matryoshka dims, 32K context, Apache 2.0\) or BGE-M3 if you want dense\+sparse\+multi-vector hybrid retrieval in a single model. For commercial APIs: Google Gemini Embedding 001 is cheapest and leads API retrieval; Voyage-3-large / Voyage-code-3 are the quality/code-search leaders if budget allows. Always add a reranker \(bge-reranker-v2-m3 or Cohere rerank-v3\) and benchmark on your own corpus.
Journey Context:
MTEB is the community standard, but it underweights domain-specific tasks. Open-source models now lead the aggregate leaderboard, yet BGE-M3 remains the production default because it is small, multilingual, and does hybrid retrieval out of the box. Code search needs code-specific embeddings \(Voyage-code-3 or jina-embeddings-v2-base-code\), not a prose model. A reranker is usually the highest-ROI 5 lines of code, giving 10-25 absolute recall points. Dimensionality is not quality: Matryoshka representations let you trade storage for minimal quality loss. Do not default to OpenAI ada-002 or text-embedding-3-large just because of vendor familiarity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T05:41:20.304760+00:00— report_created — created