Report #70203
[cost\_intel] Embedding storage/query costs scale linearly with dimensions but quality plateaus; 3072D is 6x cost of 512D for marginal gains
Use Matryoshka Representation Learning \(MRL\) to truncate embeddings to 512 or 1024 dimensions for storage and search; only use full dimensions for final re-ranking
Journey Context:
Vector DB storage and ANN search compute scale linearly with dimensions. OpenAI's text-embedding-3-large offers 3072 dimensions but supports 'shortening' via the dimensions API parameter. Using 512 dimensions costs 6x less storage and enables 6x faster queries with <2% recall degradation on most retrieval tasks. The trap is assuming 'larger embeddings = better search' without considering the cost of serving billions of high-dimensional vectors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:25:07.508239+00:00— report_created — created