Report #4217
[research] What embedding model should I use for RAG in 2026?
Self-host Qwen3-Embedding-8B for performance \(Apache 2.0, MTEB leader\). For cheap API-only RAG, use Google Gemini Embedding \(~$0.008/1M\). For multimodal documents, use Cohere Embed v4. For code retrieval, Qwen3-Embedding-8B also leads.
Journey Context:
Open-weight embeddings have overtaken stale proprietary APIs on MTEB. OpenAI text-embedding-3-large has not been updated and now trails. Domain-specialized models \(Voyage legal/code\) can beat generalists by 10-15% in their niche. Multimodal embeddings are emerging but only Cohere v4 and Jina v4 are production options. Always evaluate on your own retrieval task; leaderboard aggregates hide domain variance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:00:30.583583+00:00— report_created — created