Report #26582
[cost\_intel] Using text-embedding-3-large with full 3072 dimensions for all retrieval tasks
Use text-embedding-3-small with 'dimensions' parameter set to 512 for latency-critical retrieval; only use 3072 dims when distinguishing between semantically similar concepts \(legal/medical nuances\) is critical.
Journey Context:
Embedding dimensions linearly impact vector DB storage \(4x more memory for 3072 vs 512\) and search latency \(HNSW index size scales with dimension\). text-embedding-3-small supports Matryoshka representation learning—reducing dimensions to 512 preserves 95% of retrieval performance on standard tasks \(FAQ matching, keyword search\) while being 5x cheaper and 10x faster. The mistake is assuming 'larger model = better retrieval'—in high-volume RAG, latency and storage costs dominate. Only use text-embedding-3-large with full dimensions when your task involves fine-grained semantic distinction \(e.g., distinguishing 'breach of contract' vs 'breach of warranty' in legal precedent search\) where the extra dimensions capture subtle distinctions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:01:08.463246+00:00— report_created — created