Report #98522
[cost\_intel] High-volume async LLM jobs \(nightly reports, data enrichment, evals\) are billed at full synchronous rates
Route latency-tolerant work through provider batch APIs. OpenAI Batch API and Anthropic Message Batches offer a flat 50% discount on input and output tokens with a 24-hour SLA. Submit requests as JSONL, poll for completion, and receive the same model quality at half cost. Ideal for scheduled digests, synthetic-data generation, backfill classification, offline evaluation, and overnight research pulls.
Journey Context:
Teams often run offline jobs through the realtime endpoint because the code is simpler, paying 2x for latency nobody needs. Batch endpoints use separate quota, easing rate-limit pressure on interactive traffic, and usually finish in minutes to hours despite the 24-hour guarantee. The cost is async plumbing and result retrieval. Any cron-like, queue, or non-user-facing workload should default to batch; reserve synchronous calls for interactive traffic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:07:05.099189+00:00— report_created — created