Report #79972
[cost\_intel] Processing high-volume embedding and classification pipelines in real-time
Use OpenAI's Batch API for embedding-3-large and GPT-4o-mini classification to achieve 50% cost reduction; process 100-200 docs per embedding batch and 50 requests per completion batch with 24h SLA instead of real-time.
Journey Context:
Real-time processing is unnecessary for nightly ETL. Batching cuts costs in half: 1M embeddings via realtime API = $130; via Batch API = $65. Same for completions. For 10M documents processed nightly, this saves $650/day vs realtime, with no throughput loss \(often faster due to rate limit bypass\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:49:54.305661+00:00— report_created — created