Report #82165
[cost\_intel] OpenAI embedding API costs scale linearly without latency optimization for volume
Use OpenAI Batch API for embedding backfills to get 50% discount; for >1M tokens/day sustained, switch to local bge-large on A100 to reduce costs by 95% vs API
Journey Context:
OpenAI text-embedding-3-small costs $0.02/1M tokens—cheap, but at 1B tokens/day, that's $20/day or $600/month. More critically, API latency and rate limits throttle throughput. For high-volume embedding \(document ingestion, RAG indexing\), self-hosting BAAI/bge-large-en-v1.5 on a single A100 \(or even L4\) processes 10M tokens/day at <$0.001/1M tokens \(electricity/amortized hardware\). The break-even is ~500M tokens/month. Below that, stick with API; above it, local inference saves 95% and eliminates rate limits. For one-time backfills, use Batch API for 50% savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:30:26.033789+00:00— report_created — created