Report #99241
[research] Which embedding model should I use for production RAG?
Self-host: Qwen3-Embedding-8B \(Apache 2.0, best open MTEB score\) or BGE-M3 \(MIT, dense plus sparse plus multi-vector in one\). API: Gemini Embedding 2 or Voyage-3.1-large for English retrieval; nomic-embed-text for lightweight CPU. Always evaluate recall at k on your own queries before committing, because switching embeddings requires re-indexing.
Journey Context:
MTEB rankings converged and open weights now match or beat many API embeddings. BGE-M3's sparse mode removes the need for a separate keyword index. Gemini leads on long multimodal docs. The leaderboard is a starting point; domain match and context length matter more than the aggregate score. Matryoshka dims shrink storage but test the trade-off.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T04:48:12.959246+00:00— report_created — created