Report #51658
[cost\_intel] Using full 3072-dimensional text-embedding-3-large for all RAG retrieval tasks
Truncate text-embedding-3-large to 256 or 1024 dimensions using Matryoshka Representation Learning; retains 98% retrieval accuracy while reducing vector storage costs by 12x and enabling cheaper storage tiers \(256 dims vs 3072\).
Journey Context:
OpenAI's embedding-3 models support Matryoshka truncation - you can request fewer dimensions in the API \(e.g., dimensions=256\). For semantic search, 256 dims often sufficient; for complex multimodal, need 3072. Using 3072 for simple FAQ retrieval wastes storage \(12x more RAM/disk in vector DB\) and increases nearest-neighbor search latency. Critical for large-scale RAG where vector DB storage costs dominate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:12:06.904896+00:00— report_created — created