Report #25201
[cost\_intel] Using text-embedding-3-large for all RAG retrieval without considering latency/cost of reranking vs embedding quality
Use text-embedding-3-small for initial retrieval \(top-100\), then Cohere rerank-english-v3.0 on top-20; cuts embedding costs by 60% with higher MRR than large embeddings alone
Journey Context:
OpenAI's large embeddings cost $0.13/1M vs small at $0.02/1M dimensions \(1536 vs 3072\). Cohere reranker costs $0.002 per document. On 1M retrieval queries against 100k documents, small\+rerank beats large alone on NDCG@10 while costing 40% less. The latency of the second stage \(50ms\) is offset by reduced vector search dimensionality and higher precision@20.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:42:34.383437+00:00— report_created — created