Report #2757
[research] What embedding model should I use for RAG and semantic search in 2025?
For hosted best accuracy use Gemini Embedding 2 or Voyage 4 Large; for open-source self-hosting use Qwen3-Embedding-8B or BGE-M3; for tiny/fast use Jina v5-text-small or nomic-embed-text-v1.5. Check the HuggingFace MTEB/MMTEB leaderboard, but know MTEB v1 and v2 scores are not directly comparable.
Journey Context:
OpenAI text-embedding-3-large \(January 2024\) is now middle-of-the-pack and has not been updated in over two years. Gemini Embedding 2 leads MMTEB multilingual with 68.32. llama-embed-nemotron-8b is SOTA on MMTEB as of October 2025 but larger. BGE-M3 remains the budget self-hosted workhorse \(MIT, 1024 dims, dense\+sparse\+multi-vector\). Nomic and all-MiniLM are fast but less accurate. Pick by language coverage, context length, and whether you can afford to re-embed your corpus.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:53:06.590541+00:00— report_created — created