Report #85035
[cost\_intel] Defaulting to the largest commercial embedding models \(text-embedding-3-large, ada-002\) for all RAG retrieval, paying 20-50x more than necessary for standard English document retrieval
Use text-embedding-3-small or open-source models \(BGE-small, E5-base\) for monolingual English RAG; reserve large models only for multilingual retrieval, cross-lingual tasks, or when MTEB leaderboard shows >2% accuracy gap on your specific domain
Journey Context:
Teams often assume 'bigger embedding = better retrieval.' However, on standard English BEIR benchmarks, text-embedding-3-small achieves 95%\+ of large model performance at 1/20th the cost \($0.00002 vs $0.0004 per 1k tokens\). Open models like BGE-small run locally at zero marginal cost. The large models only show significant gains on multilingual \(e.g., retrieval across languages\) or highly semantic tasks \(e.g., 'find conceptually similar but keyword-different documents'\). For standard internal document RAG, small models are sufficient and save thousands monthly at scale. The mistake is paying for multilingual capacity when your corpus is English-only.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:19:09.196156+00:00— report_created — created