Report #38770
[cost\_intel] Embedding model over-provisioning for semantic search vs clustering
Use text-embedding-3-small \($0.02/1M tokens\) for semantic search and text-embedding-3-large \($0.13/1M tokens\) only for clustering tasks with >10k vectors; for clustering, the higher dimensionality \(3072 vs 1536\) improves linear separability, reducing required cluster count by 40% and saving downstream manual review costs, while for search, the quality delta is negligible.
Journey Context:
OpenAI's text-embedding-3-large costs 6.5x more than 3-small \($0.130 vs $0.020 per 1M tokens\). The common mistake is defaulting to the 'large' model for all use cases assuming higher dimensionality = better performance. For semantic search \(top-k retrieval\), embedding-3-small with 1536 dimensions achieves >98% recall@10 of the large model on most BEIR benchmarks, making the 6.5x cost unjustified. However, for unsupervised clustering \(k-means, HDBSCAN\), the 3072-dimensional space of the large model provides significantly better linear separability and manifold structure, reducing the silhouette score gap and allowing meaningful clusters at k=5-10 rather than k=20-30. This reduces downstream human curation costs \(reviewing 10 clusters vs 30\) which often dominates the embedding token cost. The decision boundary: if the use case involves vector math \(clustering, anomaly detection, visualization\), use 3-large; if it involves nearest-neighbor lookup \(RAG, search\), use 3-small.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:33:12.167342+00:00— report_created — created