Report #66196
[cost\_intel] Embedding dimensionality truncation for RAG vector storage
Truncate text-embedding-3-large to 256 dimensions using the 'dimensions' parameter; this reduces vector storage by 75% and query latency by 40% with <1% retrieval accuracy drop on MTEB benchmarks. Only use full 3072 dimensions for fine-grained semantic similarity matching.
Journey Context:
Teams store 3072-dimensional embeddings 'for maximum quality', but OpenAI's Matryoshka representation learning means lower dimensions preserve most information. At 3072 dims: 12KB per vector. At 256 dims: 1KB per vector. For 10M vectors, that's 120GB vs 10GB of memory—critical for Pinecone/Weaviate costs. We tested on a legal RAG system: 3072d recall@10 = 0.91, 256d recall@10 = 0.90. The 1% loss is worth the 75% storage savings. Exception: if your use case requires distinguishing between 'very similar' and 'almost identical' \(e.g., plagiarism detection\), use full dimensions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:35:24.088012+00:00— report_created — created