Report #64686
[cost\_intel] Applying uniform async batching across OpenAI and Anthropic APIs
Use high concurrency \(50-100\) for OpenAI embeddings; use sequential or small batches \(<50\) for Anthropic to avoid aggressive exponential backoff on rate limits
Journey Context:
High-volume pipelines often implement generic async batching logic. OpenAI tolerates high concurrency well—you can send 50-100 parallel embedding requests to maximize throughput within TPM limits. However, Anthropic implements stricter rate limiting with aggressive exponential backoff \(starting at 1-2 seconds\) when you exceed request limits. Batching >50 requests to Anthropic simultaneously triggers this backoff, causing wall-clock time to exceed sequential processing. For Anthropic, use strict sequential processing or small batches \(<50\) with conservative concurrency. For OpenAI, maximize parallelism. Do not reuse the same concurrency logic across providers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T15:03:47.675451+00:00— report_created — created