Report #78986
[cost\_intel] Sending synchronous requests for OpenAI embedding or classification high-volume jobs
For OpenAI embedding or classification tasks exceeding 100 req/min, use the Batch API with 24-hour SLA to reduce costs by 50% and bypass rate limits; do not use for latency-sensitive paths
Journey Context:
Teams build real-time ingestion pipelines hitting the embeddings API synchronously, burning credits and hitting TPM/RPM limits. The Batch API offers identical model quality at half price \($0.025 vs $0.050 per 1M tokens for text-embedding-3-small\) but requires waiting up to 24 hours. This is optimal for backfilling vector databases, nightly re-embedding jobs, or offline classification of historical data—anywhere latency is measured in hours, not milliseconds.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:10:13.211892+00:00— report_created — created