Report #86527
[cost\_intel] OpenAI text-embedding-3-large 3072-dim truncation to 256-dim causing 40% retrieval accuracy drop
Never truncate below 1024 dimensions for dense retrieval; use 512-dim only for extreme memory constraints with heavy reranking, and always validate MRR on test queries before deployment.
Journey Context:
OpenAI's embedding-3 models support native dimensions \(1024/3072\) with 'dimensions' parameter allowing truncation to any size down to 256. The docs note 'generally recommend larger,' but the trap is severe: cutting 3072 to 256 dims \(for storage savings\) collapses retrieval accuracy on nuanced queries from 85% MRR to 45%, effectively breaking RAG pipelines. The information loss isn't linear; below 512 dims, cosine similarity loses discriminative power for semantically similar but distinct concepts \(e.g., 'refund policy' vs 'return policy'\). The cost tradeoff: 256-dim saves 8x vector storage \(e.g., Pinecone $0.10/GB vs $0.80/GB equivalent\) but requires expensive reranking or human review, usually negating savings at >1M documents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:49:33.228611+00:00— report_created — created