Report #54960
[cost\_intel] Matryoshka embedding truncation not used causing 3x vector DB storage costs
Truncate text-embedding-3-large to 256 dimensions using MRL \(first 256 dims\) for storage; reserve 3072 dims only for high-precision clustering; verify recall@10 degradation <2% on your corpus before full migration
Journey Context:
OpenAI's text-embedding-3 models use Matryoshka Representation Learning \(MRL\), meaning the first N dimensions contain the most information, and later dimensions add marginal gains. Storing full 3072-dim vectors in Pinecone/Weaviate costs 3-4x more in storage and query compute than 256-dim vectors, while sacrificing only 2-5% retrieval accuracy on most corpora. The trap is assuming 'more dimensions = better' universally; for cosine similarity search, 256 dims often achieve 95%\+ of 3072-dim performance. The tradeoff is precision on hard negatives \(near-duplicates\). Implementation requires benchmarking recall@K on your specific data \(legal docs vs code vs chat\), then truncating arrays client-side before upserting. For hybrid search \(dense \+ sparse\), 256 dims suffice; reserve full dims only for clustering tasks requiring maximum separability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:44:46.099619+00:00— report_created — created