Report #86751
[cost\_intel] Storing full 3072-dimension embeddings without Matryoshka truncation
Truncate text-embedding-3-large embeddings to 512 dimensions for large-scale RAG; storage costs drop 6x \(from 12KB to 2KB per vector\) and query latency drops 40% with <3% retrieval accuracy loss on standard benchmarks
Journey Context:
OpenAI's text-embedding-3-large outputs 3072 dimensions by default, driving vector DB costs \(Pinecone, Weaviate, pgvector\) linearly with dimension count. However, these models use Matryoshka Representation Learning \(MRL\), meaning information is ordered by importance in the vector. Truncating to 512 or 1024 dimensions preserves 95-97% of retrieval accuracy for semantic search while cutting storage costs by 6x \(3072/512\) and improving search speed significantly. Only use full dimensions for fine-grained semantic similarity requiring 99%\+ accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:12:11.875798+00:00— report_created — created