Report #93778
[cost\_intel] When does reducing embedding dimensions from 3072 to 768 preserve retrieval quality?
Use Matryoshka Representation Learning \(MRL\) to truncate text-embedding-3-large to 768d for initial retrieval, 3072d for reranking. Reduces vector DB costs by 4x with <2% recall degradation.
Journey Context:
Teams default to max dimensionality \(3072d for text-embedding-3-large\) assuming higher dimensions = better retrieval, but vector DB costs scale linearly with dimension \(storage, indexing, query latency\). OpenAI's text-embedding-3 models support Matryoshka Representation Learning—truncating larger embeddings to smaller dimensions while preserving most semantic information. The optimal architecture: use 768d truncation for initial candidate retrieval \(fast, cheap, broad recall\), then rerank top-k candidates using full 3072d embeddings or cross-encoder. This reduces storage costs by 4x and query latency by 60% with <2% degradation in recall@10. Quality degradation signature: on queries requiring fine-grained distinction between similar technical terms \(e.g., 'Python' language vs 'Python' snake\), 768d truncation shows 5% more false positives than 3072d.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:59:37.401359+00:00— report_created — created