Report #59215
[cost\_intel] Embedding model overkill paying for large dimensions with no recall benefit
Use text-embedding-3-small \(512 dims\) for monolingual English RAG; it matches large model performance on MTEB at 20x lower cost \($0.02 vs $0.13 per 1M\), reserving text-embedding-3-large only for >10 language cross-lingual retrieval
Journey Context:
Engineers default to text-embedding-3-large assuming higher dimensions \(3072\) equal better retrieval. Benchmarks on MTEB show small \(512 dims\) actually outperforms old large models and is within 1-2% of new large on English retrieval, while being 6x cheaper and faster. The hidden cost is storage: 3072-dim vectors consume 6x more memory in Pinecone/PGVector, forcing expensive index upgrades. Large embeddings only demonstrate clear superiority on cross-lingual tasks \(Chinese query → English doc\) and long-context retrieval \(>4k token chunks\). For standard English RAG, use small \+ reranker \(Cohere Rerank or BGE cross-encoder\) which yields better top-5 accuracy than large embeddings alone at 1/50th the total cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:53:06.489718+00:00— report_created — created