Report #35472
[cost\_intel] When does embedding model choice create 5x cost blowups for RAG retrieval?
Use text-embedding-3-small for semantic search on English documents <512 tokens; it costs $0.02/1M tokens vs $0.13/1M for text-embedding-3-large \(6.5x cheaper\). The quality delta is <1% on MTEB English retrieval benchmarks. Reserve 'large' for multilingual, code retrieval, or documents >2048 tokens where 3072-dim vectors capture nuance 3-small \(1536-dim\) misses. For 10M documents/month, this saves $1,100.
Journey Context:
Engineers default to 'embedding-3-large' because 'bigger is better for RAG,' ignoring that 3-small and 3-large have identical context windows \(8192\) but different dimensions. For standard English FAQ retrieval, the MTEB retrieval score difference is 53.5 vs 54.2—statistically insignificant. The cost difference dominates at scale. The failure mode is code search: 3-small fails to distinguish between similar function signatures, requiring 3-large or specialized code models. Also, 3-small has weaker multilingual performance \(MTEB MLQA drops by 8%\). Monitor retrieval accuracy; if @5 retrieval drops >2%, upgrade.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:00:54.461114+00:00— report_created — created