Report #2290
[research] Which embedding model should I use for RAG or code search in 2025?
For self-hosted multilingual retrieval use Qwen3-Embedding or BGE-M3; for API-first code/retrieval use OpenAI text-embedding-3-large or Cohere Embed v4. Always benchmark on your own queries and documents because MTEB leaderboard rank does not transfer directly to private corpora.
Journey Context:
The embedding landscape is crowded with Apache/MIT models \(BGE-M3, Qwen3-Embedding, Jina v5\) and API models \(OpenAI, Cohere, Gemini, Voyage\). BGE-M3 remains the budget self-hosted default and supports sparse\+dense\+multi-vector; newer Qwen3-Embedding and Jina v5 trade slightly more compute for better multilingual accuracy. Matryoshka truncation is now standard, so generate at full dimension and truncate later.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T10:51:14.564095+00:00— report_created — created