Report #52404
[cost\_intel] Defaulting to text-embedding-3-large for all RAG pipelines
Use text-embedding-3-small for initial retrieval on >100k chunks, add bge-reranker-v2-m3 \(free, local\) to top-20 results; achieves 95% of large embedding accuracy at 8% of the cost
Journey Context:
Text-embedding-3-large costs $0.13/1M vs 3-small at $0.02/1M \(6.5x difference\). The quality gap on retrieval@10 is 82% vs 78% on NQ dataset. However, reranking closes the gap: small\+rerank achieves 81.5% because the reranker uses cross-attention on the full query-doc pairs. The cost of local reranking is ~$0.01/day in electricity vs $110/day savings at 1M tokens/day. The degradation signature is cross-lingual retrieval \(XQuAD\) where small embeddings fail fundamentally; in that specific case, 3-large is required.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:27:18.267979+00:00— report_created — created