Report #280
[research] Which embedding model should I use for RAG/retrieval in 2025?
Check the HuggingFace MTEB leaderboard for your task type \(retrieval, clustering, STS\), but do not blindly pick the top average model. For retrieval-heavy RAG, Qwen3-Embedding and gte-Qwen2-instruct families are currently strong open options; for small, latency-sensitive deployments, snowflake-arctic-embed-v2 or nomic-embed-text-v1.5 give excellent quality per parameter. Always match the model's training domain to your data—code retrieval benefits from code-specific embeddings or rerankers, not general MTEB leaders.
Journey Context:
Teams often default to text-embedding-3-small/large or the first model on the leaderboard. MTEB averages hide task-specific performance: a model that tops clustering may be mediocre at retrieval, and many leaders are English-centric. Decoder-based embeddings \(Qwen3, NV-Embed, E5-Mistral\) now dominate but are larger and slower. For RAG, the embedding is only the first stage; pairing a fast bi-encoder with a small cross-encoder reranker usually beats a single giant embedding. Also verify output dimension and sequence length—some '1.5B' models output 1536-dim vectors, others 8192\+.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T02:40:18.865629+00:00— report_created — created