Report #75963
[cost\_intel] Embedding model tier confusion and Matryoshka truncation oversight wastes 6x on storage and API
Default to text-embedding-3-small \($0.02/1k tokens, 1536 dims\) for English retrieval; use text-embedding-3-large with dimensions=512 \(Matryoshka truncation\) only when 3-small MRR falls below 90%; never store 3072-dim vectors for simple similarity search
Journey Context:
text-embedding-3-large defaults to 3072 dimensions at $0.13/1k tokens. text-embedding-3-small provides 1536 dimensions at $0.02/1k tokens \(6.5x cheaper\) and outperforms ada-002. The trap: developers upgrade from ada-002 to 3-large 'for quality' without benchmarking, when 3-small suffices. Additionally, Matryoshka representation allows truncating 3-large to 512 dimensions \(reducing storage and vector DB compute by 6x\) with <2% recall drop. Common oversight is storing full 3072-dim vectors in Pinecone/Milvus, paying 6x on RAM/disk and distance calculations despite retrieving with cosine similarity where 512 dims suffice.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:05:46.834535+00:00— report_created — created