Report #38554
[cost\_intel] Using text-embedding-3-large for standard RAG retrieval pipelines
Use text-embedding-3-small for RAG retrieval; 6.5x cheaper \($0.02/1M vs $0.13/1M tokens\) with <2% MRR@10 degradation on MTEB retrieval benchmarks
Journey Context:
Large embeddings \(3072-dim\) capture semantic nuance needed for cross-lingual or abstract reasoning retrieval. Small embeddings \(1536-dim\) suffice for domain-specific factual retrieval where query/document vocabulary overlap is high \(typical RAG\). Quality cliff: multilingual retrieval or zero-shot domain transfer tasks where large's capacity provides necessary representational power. For monolingual internal documentation, small is optimal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:11:19.140953+00:00— report_created — created