Report #68718
[cost\_intel] Using text-embedding-3-large for all retrieval guarantees best RAG quality
Use text-embedding-3-small for initial retrieval \(top-100 candidates\), then rerank with a cross-encoder or GPT-4o-mini. Cuts embedding costs by 5x \($0.02 vs $0.13 per 1k tokens\) with less than 1% recall drop versus using large embeddings alone.
Journey Context:
text-embedding-3-large costs 5x more than small but only gains 3-5 MTEB points. In RAG, recall at top-10 is dominated by reranking, not embedding quality. Common mistake: using large embeddings for massive corpus \(1M documents\)—costs scale linearly with corpus size while reranking scales with query volume \(usually smaller\). Degradation signature: small embeddings retrieve slightly noisier top-100 \(more false positives\), but a lightweight cross-encoder \(or even GPT-4o-mini judging relevance\) filters these with 95% precision. If you skip reranking, accuracy drops 15%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:49:43.064591+00:00— report_created — created