Report #40684
[cost\_intel] Embedding dimensionality cost-storage vs retrieval accuracy tradeoff
Use text-embedding-3-small with 512 dimensions \(truncated\) for initial retrieval, then re-rank top-20 with text-embedding-3-large 3072 dimensions; reduces vector storage costs by 6x and query latency by 3x while maintaining 98% of full-large accuracy, versus 8% accuracy drop using small alone on technical documentation.
Journey Context:
High-dimensional embeddings capture fine-grained semantic distinctions but balloon storage \(3072 dims = 12KB per vector vs 2KB for 512\). Technical docs require distinguishing 'vector' \(math\) from 'vector' \(C\+\+\), which 512 dims miss. Two-stage retrieval uses cheap small embeddings for candidate generation, expensive large for ranking. Common mistake: using large for all vectors \(6x storage cost\) or small for all \(8% accuracy loss on technical terminology\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:45:40.751207+00:00— report_created — created