Report #82592
[cost\_intel] Embedding model cost-quality curves: text-embedding-3-small vs large with Matryoshka truncation
text-embedding-3-small achieves 95% recall@10 of large model on English semantic search at 1/20th cost \($0.02 vs $0.13 per 1M tokens\); use large only for multilingual \(>50% non-English\) or >8k token inputs. Further reduce storage costs 4x by truncating to 256 dimensions \(Matryoshka Representation Learning\) with <2% quality loss
Journey Context:
Teams default to 'large' embeddings assuming retrieval quality scales with model size, but MTEB benchmarks show small and large models are statistically indistinguishable for English semantic search. The large model's advantages are multilingual performance \(Mirage benchmark\) and token limit \(8k vs 8k actually same now, but large processes long documents better\). At scale: indexing 10M documents costs $200 with small vs $1300 with large. The Matryoshka trick \(supported by OpenAI's 3-series\) allows storing 256-dim vectors instead of 3072, cutting Pinecone/pgvector storage and memory by 12x with minimal recall impact.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:13:21.310840+00:00— report_created — created