Report #36139
[cost\_intel] Processing 10M documents with text-embedding-3-small one-by-one; API rate limits and $2000/day costs
Use OpenAI Batch API for embedding jobs >100k requests: 50% cost reduction, 24h async turnaround. For real-time needs >1M/day, self-host BGE-large-en-v1.5 on A100: break-even at ~5M tokens/day vs API.
Journey Context:
API per-request overhead dominates small batches. Batch API removes this overhead and offers 50% discount. For latency-sensitive high volume, self-hosting avoids rate limits. API costs ~$0.10/1M tokens \(3-small\), self-hosting ~$0.02/1M tokens at scale but requires GPU capex.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:08:16.990817+00:00— report_created — created