Report #58821
[cost\_intel] Using full 3072-dim embeddings from text-embedding-3-large wastes 3x vector DB storage and 2x query latency vs 256-dim with minimal recall loss
Truncate embeddings to 256-512 dimensions for RAG retrieval tasks; only preserve full dimensions for exact semantic similarity matching or clustering tasks
Journey Context:
OpenAI's text-embedding-3 models support native truncation via the dimensions parameter, but most production systems call the API with defaults \(3072 for -large, 1536 for -small\). Vector databases like Pinecone, Weaviate, or pgvector charge by dimension count and storage. A 3072-dim float32 vector consumes 12KB of memory/disk per vector. At 1 million vectors, this is 12GB versus 1GB for 256-dim vectors. Query latency scales with dimension due to distance calculation complexity \(dot product or cosine similarity\). The quality tradeoff is minimal for retrieval: MTEB benchmarks demonstrate that truncating text-embedding-3-large to 256 dimensions drops retrieval recall@10 by less than 0.5% on standard datasets \(from ~55.3 to ~54.8\). However, for clustering or exact semantic matching tasks, higher dimensions preserve necessary semantic nuance. The trap is assuming 'larger is always better' without considering the $0.10/GB/month vector storage costs and latency penalties that compound at scale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:13:09.185037+00:00— report_created — created