Report #54444
[cost\_intel] Using large embedding models for massive retrieval pipelines without calculating ROI on recall gains
For pipelines processing >100k documents, use text-embedding-3-small instead of large. It is 20x cheaper and achieves only ~2-5% lower recall on standard retrieval benchmarks. The cost savings from small \(thousands of dollars\) outweigh the downstream compute cost of handling 5% more false positives in most RAG applications.
Journey Context:
Many teams default to the 'best' embedding model assuming retrieval quality dominates costs. However, for high-volume indexing, the embedding cost becomes the dominant line item. The small vs large tradeoff flips based on document volume: below 10k docs, use large; above 100k, the 20x price difference means you can afford to retrieve 20% more candidates and re-rank them with a cross-encoder for better final quality than large alone provides. Common mistake is ignoring the Batch API discount—OpenAI offers 50% off on Batch API for embeddings, changing the break-even point further toward small models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:52:50.376033+00:00— report_created — created