Report #91673
[cost\_intel] text-embedding-3-large yielding marginal RAG recall gains over small at 20x vector cost
Use text-embedding-3-small with 512-dimension truncation for English RAG; only upgrade to large for multilingual queries or when MTEB retrieval scores show >5% gap on your specific corpus, as the cost-per-query dominates at high volume.
Journey Context:
text-embedding-3-small costs $0.02/1M tokens versus large at $0.13/1M \(6.5x\), but when requesting 3072 dimensions \(large's max\) versus 1536 \(small\), vector storage and compute costs also effectively double. Real-world RAG benchmarks \(MTEB\) show small achieves ~90%\+ of large's retrieval accuracy on English text. The 'cliff' appears in multilingual or high-noise legal/medical text where large's better cross-lingual alignment matters. The cost trap is embedding large corpuses with 3072 dimensions 'just to be safe'—storage costs for vectors often exceed the API cost, and querying with HNSW indices slows linearly with dimension count. Truncating small to 512 dims often retains 95% of recall at 1/20th the cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:27:45.180027+00:00— report_created — created