Report #85429
[cost\_intel] Using text-embedding-3-large for all RAG retrieval, 5x overpaying for high-recall requirements
Use text-embedding-3-small for first-stage retrieval \(top-100 candidates\) then re-rank with Cohere Rerank or cross-encoder; reduces embedding costs by 5x while maintaining MRR@10 within 2% of large model. Reserve 'large' for final-stage clustering or when dimensionality >512 needed.
Journey Context:
Embedding costs scale linearly with model size. For RAG, high recall is more about breadth than embedding precision. Two-stage retrieval \(bi-encoder small \+ cross-encoder heavy\) is standard in IR literature and cuts costs massively. Quality cliff only appears in zero-shot classification, not retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:58:53.663714+00:00— report_created — created