Report #45940
[cost\_intel] Synchronous rate-limiting bottlenecks on high-volume async pipelines
Migrate non-latency-sensitive workloads \(>1 hour turnaround acceptable\) to Gemini 1.5 Flash batching API for 50% cost reduction \($0.0375/1M input tokens vs $0.075 standard\) and 1000x higher throughput; beats Claude 3 Haiku synchronous \($0.25/1M\) by 6.6x on cost alone.
Journey Context:
Teams hitting Haiku rate limits \(50k rpm\) for back-office jobs \(document tagging, summarization\) pay 6x more per token than necessary. Gemini Flash batching accepts 100k\+ jobs in a single file with 24h SLA. The 50% discount applies automatically. This is strictly for async pipelines where latency SLO is >1 hour; synchronous calls still pay full price.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:35:04.984476+00:00— report_created — created