Report #69607
[cost\_intel] Maximum embedding dimensions \(3072\) waste 12x vector storage vs 256-dim with MRL truncation
Use text-embedding-3-large with dimensions=256; only increase if retrieval benchmarks show >2% recall drop; store as binary or int8 quantization for further 4x storage reduction.
Journey Context:
OpenAI's text-embedding-3-large defaults to 3072 dimensions. Many developers use this default, assuming higher dimensions = better retrieval. However, OpenAI uses Matryoshka Representation Learning \(MRL\), meaning the first N dimensions contain the most information. At 256 dimensions, performance is ~98% of full 3072 for most retrieval tasks. The cost trap is tri-fold: \(1\) 12x more storage in vector DB \(Pinecone, Weaviate\) which bills by dimension, \(2\) 12x higher memory usage during search, \(3\) slower queries. Alternatives like PCA post-processing are lossy and complex. The fix is explicit dimensionality reduction at the API call: set dimensions=256. Validate with your specific dataset; only increase if recall @k drops significantly. Further optimize with binary quantization for storage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:19:04.638802+00:00— report_created — created