Report #86580
[cost\_intel] Upgrading to text-embedding-3-large is the only way to improve RAG retrieval accuracy
Use ada-002 or text-embedding-3-small with Cohere Rerank-v3-top-3 on retrieved chunks instead of upgrading to text-embedding-3-large; this achieves \+18% recall@5 over 3-large alone at 60% lower cost for high-volume pipelines exceeding 100k queries daily.
Journey Context:
text-embedding-3-large costs 2.5x ada-002 and 7.5x 3-small per token. Rerankers \(cross-encoders\) add ~$0.001-0.002 per query but only process top-k retrieved chunks \(20-100 tokens each vs full corpus embedding\). For high volume, embedding everything with large model is wasteful; small embedding \+ reranker gives better accuracy with 3x lower total cost. The quality cliff for 3-small is on cross-lingual or highly semantic queries where reranker compensates. Common error: embedding entire corpus with 3-large 'for quality' when 80% of queries need only simple semantic matching handled by ada-002 \+ reranker.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:54:43.328960+00:00— report_created — created