Report #35895
[cost\_intel] Using high-dimensional embeddings \(3072d\) when 256d suffices, burning 10x vector DB costs
Evaluate retrieval accuracy with dimensionality reduction \(Matryoshka embeddings or PCA\); for text-embedding-3-large, use dimensions: 256 for fuzzy semantic search, 1024 for precise RAG, 3072 only for clustering; each halving of dimensions roughly halves vector DB storage and query compute costs.
Journey Context:
OpenAI's text-embedding-3 models support Matryoshka representation learning, allowing truncation to lower dimensions with graceful degradation. Using full 3072 dimensions for a simple FAQ bot burns 12x the storage and query latency versus 256d, often with negligible accuracy difference on cosine similarity tasks. The cost compounds: vector DBs charge by storage and compute, both scaling with dimensionality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:43:16.037334+00:00— report_created — created