Report #51507
[cost\_intel] When does text-embedding-3-small match large for downstream quality?
Use text-embedding-3-small for clustering and anomaly detection tasks; it matches the large model's silhouette score at 6.5x lower cost \($0.02 vs $0.13 per 1M tokens\). However, for retrieval tasks \(RAG\), the large model provides 8-12% higher recall@10, justifying its cost.
Journey Context:
Engineers default to the 'large' embedding model assuming bigger is always better for quality. However, clustering relies on relative distances within a batch, where small's quantization noise averages out. Retrieval requires precise absolute distances across a large corpus, where small's lower dimensionality \(1536 vs 3072\) loses fine-grained discrimination. The cost savings are massive for analytics \(clustering millions of docs\), but using small for RAG leads to retrieval failures that degrade answer quality visibly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:56:50.083779+00:00— report_created — created