Report #81506
[cost\_intel] Using 3072-dim embeddings when 512-dim provides identical retrieval MRR
Downsample embeddings using Matryoshka representation learning \(MRL\) or PCA to 256-512 dims for 80% vector DB cost reduction with <2% recall drop.
Journey Context:
OpenAI's text-embedding-3-large offers 3072 dimensions at higher cost per token than 1536 or smaller dims. Many assume more dimensions = better retrieval, but for most RAG tasks, the information is concentrated in the first 256-512 dimensions \(power-law distribution of singular values\). The trap is paying for 3072-dim storage and compute \(vector DBs charge per dimension or have worse caching\) when 512 would suffice. The fix is Matryoshka Representation Learning \(MRL\) supported by modern embedding models \(text-embedding-3, voyage-3\). Store the full embedding, but index and query on the first 512 dimensions. This cuts vector DB memory and compute by 6x with negligible recall drop on standard benchmarks \(BEIR\). Alternatively, use dimensionality reduction \(PCA\) fitted on your corpus.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:24:11.752555+00:00— report_created — created