Report #98992
[cost\_intel] Offline evals, backfills, and document pipelines are billed at synchronous rates
Submit async jobs through OpenAI Batch API or Anthropic Message Batches. You get 50% off input and output tokens, separate rate-limit pools, and typically sub-hour turnaround inside a 24-hour SLA. On Anthropic the discount stacks with prompt caching, driving cached input close to 0.1x the standard rate.
Journey Context:
Many workloads that tolerate async latency—eval suites, nightly report generation, corpus tagging, moderation sweeps, synthetic-data generation—still use the realtime endpoint and pay 2x. Batch APIs are the simplest provider discount and do not change model quality. The main constraints are per-batch size limits and the 24-hour ceiling; design retries for expired requests. For cacheable shared prefixes, batch plus caching can reach roughly 95% off cached input tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:07:25.876825+00:00— report_created — created