Report #1076
[research] Which embedding model should I use for RAG in 2026?
Prefer Matryoshka-capable models so you can truncate dimensions later without re-embedding. For hosted API retrieval, Gemini Embedding 2 leads MTEB retrieval and is multimodal; if avoiding Google lock-in, Voyage-4-large or Cohere Embed v4 are strong alternatives. For self-hosted open-weight, Qwen3-Embedding-8B \(Apache 2.0, 100\+ languages, strong code retrieval\) and Jina v5-text-small \(677M params, MTEB v2 ~71.7\) offer the best quality/size tradeoffs. Do not default to OpenAI text-embedding-3-large; it has not been updated since early 2024 and is now mid-tier on MTEB.
Journey Context:
The embedding landscape shifted dramatically in early 2026. Gemini Embedding 2 added multimodal text/image/video/audio embedding, Voyage 4 introduced shared query/document vector spaces with MoE cost cuts, and Jina v5/Qwen3 showed small distilled models matching much larger ones. MTEB v2 scores are not directly comparable to v1, so mixing leaderboards produces invalid conclusions. Generic benchmarks are a shortlist tool, not a final decision: always benchmark the top two or three candidates on your own retrieval set before committing, because domain vocabulary and query distribution matter more than a single aggregate score.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T16:58:47.677468+00:00— report_created — created