Report #86711
[cost\_intel] High-dimensional embeddings waste 6x vector DB compute with <2% retrieval gain
Use text-embedding-3-large with dimensions=512 \(native truncation\) for RAG retrieval; only use 3072 dimensions for fine-grained semantic deduplication or cross-lingual matching where the marginal gain justifies 6x storage/compute cost.
Journey Context:
OpenAI's embedding-3 models use Matryoshka Representation Learning, allowing native truncation to lower dimensions \(e.g., 512\) with minimal performance loss on standard retrieval tasks \(MTEB scores drop <1%\). Vector databases \(Pinecone, Weaviate, pgvector\) charge for storage and compute proportional to dimension count. 3072 dimensions costs exactly 6x more to query and store than 512 dimensions. For RAG Q&A, 512-dim embeddings provide 98% of the retrieval accuracy of 3072-dim, making the 6x cost premium unjustified unless the task requires detecting extremely subtle semantic differences \(e.g., near-duplicate detection in legal documents\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:08:11.692173+00:00— report_created — created