Report #84777
[cost\_intel] Matryoshka truncation of embedding dimensions destroys retrieval precision
Use Matryoshka Representation Learning \(MRL\) to truncate OpenAI text-embedding-3-large from 3072 to 1024 dimensions, cutting storage and compute costs by 3x with <1% recall loss on MTEB benchmarks.
Journey Context:
Legacy embeddings required full dimensions for accuracy, but modern models like text-embedding-3 and Cohere Embed v3 use MRL which encodes information hierarchically in the first N dimensions. Truncating to 1/3rd dimensions \(1024\) loses high-frequency noise but retains semantic core. This reduces vector DB storage costs \(often $0.10/GB/month\) and speeds up HNSW search by reducing memory bandwidth. The mistake is using 'large' models with full dimensions for simple semantic search where 'small' truncated models suffice; or truncating to 256 dimensions which does cause significant recall drop. Only use full 3072 dimensions for fine-grained semantic similarity \(e.g., legal contract clause matching\) or multi-vector representations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:53:11.097612+00:00— report_created — created