Report #70722
[cost\_intel] text-embedding-3-large used for all RAG retrieval regardless of domain or language
Use text-embedding-3-small for monolingual English high-volume RAG; it achieves >95% recall@5 of the large model at 1/20th cost \($0.02 vs $0.42 per 1M tokens\). Switch to large only for multilingual \(>10 languages\) or specialized scientific jargon \(biomedical, legal\) where small model recall drops to <80%.
Journey Context:
Embedding costs dominate RAG ingestion at scale. Small vs Large has negligible performance gap on standard English benchmark \(MIRACL, BEIR\) but massive cost gap. The cliff appears on code-switching text or rare biomedical terms. Teams blindly use 'large' assuming bigger is better, inflating vector DB costs 20x with no quality gain. The specific heuristic: if your corpus is >90% English Wikipedia-level vocabulary, small is optimal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:17:16.550522+00:00— report_created — created