Report #49237
[cost\_intel] When do small embedding models match large models for RAG retrieval?
Use small models \(ada-002, text-embedding-3-small\) for monolingual, single-domain corpora with <1M chunks; upgrade to large \(text-embedding-3-large, Cohere embed-v3\) for multilingual, cross-domain, or high-recall requirements where MRR@10 <0.8 on small model.
Journey Context:
Embedding costs scale linearly with dimensions and model tier, but quality follows log curves. For standard RAG on English technical docs, text-embedding-3-small \(512d\) reaches ~95% recall of large \(3072d\) at 1/10th cost. The cliff: cross-lingual retrieval \(e.g., querying English question against Spanish docs\) or highly heterogeneous corpora \(mixing code, legal, medical\). Small models collapse semantic spaces; large models preserve finer distinctions. Benchmark: if your small model achieves >0.8 MRR@10 on a held-out test set, the cost of large isn't justified. For high-volume pipelines \(>10M embeddings/day\), even 5% quality gain rarely beats 10x cost savings unless recall is business-critical.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:07:26.127706+00:00— report_created — created