Report #65649
[cost\_intel] Defaulting to the most expensive embedding model for all retrieval tasks regardless of distinction requirements
Use text-embedding-3-small or equivalent for general retrieval. Reserve text-embedding-3-large for tasks requiring fine semantic distinctions: legal document retrieval, medical literature search, deduplication of near-duplicate content. Cost difference is 5-10x; quality gap on typical RAG workloads is under 5%.
Journey Context:
OpenAI text-embedding-3-small costs $0.02/M tokens versus text-embedding-3-large at $0.13/M tokens — a 6.5x price difference. For most RAG use cases with clearly distinct document categories, retrieval recall@10 differs by under 5% between models. The large model justifies its cost only when: \(1\) your corpus has many semantically similar documents requiring fine-grained ranking, \(2\) you are doing semantic search over highly technical or specialized content where subtle distinctions matter, \(3\) retrieval recall directly impacts downstream generation quality in measurable ways. Audit method: compare retrieval recall@k between models on your actual corpus and queries. If the gap is under 5%, use the small model. The embedding cost compounds because you pay for every document at index time and every query at search time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:40:24.597563+00:00— report_created — created