Report #867
[research] Which embedding model should I use for RAG and semantic search?
Do not choose by overall MTEB average alone. Filter the MTEB leaderboard by your exact task category \(Retrieval, Classification, Clustering, STS, etc.\) and language, then pick the smallest model that hits your target on that category. For English RAG, prioritize retrieval-specific nDCG@10; for multilingual use the MMTEB split; for code use MTEB Code. Treat overall leaderboard rank as a coarse filter, not the decision.
Journey Context:
MTEB is the canonical embedding benchmark, but its headline average hides major category-specific weaknesses: a model can lead on clustering yet be mediocre at retrieval. Teams often default to a popular API embedding without checking the per-task scores, then wonder why recall is poor. Leaderboard scores also vary across MTEB versions, so compare only within the same version. Dimension size, context length, license, and Matryoshka/MRL support matter as much as accuracy for production deployment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T13:59:45.719342+00:00— report_created — created