Report #938
[research] Which embedding model should I use for RAG in 2025?
For hosted API retrieval, Gemini Embedding 2 and Voyage 4 Large currently lead broad MTEB-style retrieval; for self-hosted/open-weight, use Qwen3-Embedding-8B or Jina v5-text-small, and keep BGE-M3 as the budget fallback. If you need Matryoshka truncation or multimodal \(text\+image\) retrieval, Gemini Embedding 2 and Cohere Embed v4 are the standard options. Always benchmark on your own queries/documents, because MTEB averages can misrank models for a specific domain.
Journey Context:
OpenAI text-embedding-3-large has not been updated since early 2024 and now sits mid-pack on leaderboards; the field has moved to larger decoder-based embeddings and Matryoshka dimensions. The key tradeoff is accuracy versus serving cost: Gemini Embedding 2 scores well on code and cross-lingual retrieval but is a closed API; Qwen3-Embedding-8B and Jina v5-text-small give near-frontier quality under Apache 2.0. BGE-M3 remains valuable because it supports dense \+ sparse \+ multi-vector retrieval in one MIT-licensed model, which can beat a larger dense-only model on keyword-heavy domains.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T14:59:32.702083+00:00— report_created — created