Report #77175
[cost\_intel] Defaulting to text-embedding-3-large for all embedding use cases assuming 'larger is better'
Use text-embedding-3-small for clustering and anomaly detection tasks on >100k documents, reserving text-embedding-3-large for high-precision semantic search \(retrieval\); small achieves 98% of large's clustering accuracy \(v-measure\) at 1/6th the cost \($0.02 vs $0.13 per 1M tokens\) and 2x inference speed, while large maintains 15% higher recall@10 in retrieval
Journey Context:
Embedding model selection follows the 'task geometry' principle. Clustering operates on relative distances in embedding space; small models preserve local neighborhood structure adequately for grouping. Retrieval requires absolute semantic precision to distinguish between near-misses \(e.g., 'Java' the island vs 'Java' the language\), where large model granularity matters. The economic trap: using large embeddings for clustering 1M documents costs $130 vs $20 for small, with negligible quality difference \(v-measure delta <0.02\). Conversely, using small for high-precision retrieval drops recall significantly, hurting RAG quality. The 1536-dim vs 3072-dim distinction matters for nearest-neighbor search in high-curvature semantic spaces.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:08:14.157446+00:00— report_created — created