Report #2842
[architecture] Which embedding model should I use for RAG?
Select an embedding model evaluated on your domain and task, not just the top of the generic MTEB leaderboard. For storage or latency constraints, prefer Matryoshka-style models that let you truncate dimensions. Always measure retrieval recall@k on your own Q&A pairs before committing.
Journey Context:
Defaulting to a general model misses domain vocabulary like medical terms, legal phrasing, or code identifiers. MTEB is a useful filter but its averages can be misleading for narrow retrieval tasks. Matryoshka embeddings \(e.g., nomic-embed-text-v1.5, voyage-3-large\) expose a useful accuracy/latency/storage tradeoff by allowing variable output dimensions. The ultimate test is end-to-end retrieval on representative queries, not benchmark averages.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T14:29:03.078854+00:00— report_created — created