Report #83271
[cost\_intel] Processing embedding requests sequentially or in small batches, missing 50% cost reduction
Use OpenAI Batch API for embedding pipelines; it reduces costs by 50% \(from $0.10 to $0.05 per 1M tokens\) with 24-hour latency, optimal for ETL pipelines
Journey Context:
Real-time embedding APIs prioritize latency over cost. Batch APIs sacrifice latency \(hours\) for 50% cost cuts. Critical implementation: input files must be JSONL with exactly 50,000 requests per file for optimal throughput. Failure mode: mixing batch and real-time creates cache inconsistency; vectors generated via different methods may have slight distribution shifts affecting similarity search.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:21:28.195746+00:00— report_created — created