Report #77432
[cost\_intel] OpenAI embedding API costs 2x higher than necessary for high-volume document processing
Use the Batch API \(JSONL\) for embedding jobs >100k documents. Reduces cost by 50% \(half price\) with 24h SLA vs real-time. Never use real-time API for offline ETL.
Journey Context:
Developers stream embeddings one-by-one for backfills, paying $0.10/1k vs $0.05/1k batch rate. Latency tradeoff: batch is async \(24h\), but ETL doesn't need realtime. Threshold: at 1M embeddings, batch saves $2,500. Quality signature: identical \(deterministic model\), but batch jobs fail silently on >100MB files or >500k tokens per request—must chunk before batching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:34:25.096919+00:00— report_created — created