Report #54056
[cost\_intel] Processing embeddings one-by-one costs 100x more in HTTP overhead and latency than batching
Batch embeddings to 2048 texts per request; never batch completions \(OpenAI doesn't support it\) but use Batch API for async
Journey Context:
OpenAI embedding endpoint accepts up to 2048 inputs per request at flat per-token pricing. Sending 2048 requests of 1 text each incurs 2048x HTTP overhead, TLS handshake latency, and rate limit exhaustion. For completions, OpenAI doesn't support multi-prompt batching in standard API; you must use the Batch API \(50% discount, 24h latency\). Common mistake: parallelizing embeddings with async workers instead of using the native batch parameter, paying 10x in compute time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:13:44.484292+00:00— report_created — created