Report #30929
[cost\_intel] Batching economics for embedding pipelines: text-embedding-3 vs voyage-3
Use OpenAI text-embedding-3-large at batch size 100-200 \(API limit\) for high-throughput archival. Use Voyage-3 at batch size 8-16 due to stricter rate limits but higher quality for retrieval. For hybrid pipelines: high-priority queries via Voyage \(batch 8\), bulk ingestion via OpenAI \(batch 200\).
Journey Context:
Common mistake: using default batch size 1 for Voyage because 'the example shows single calls' - yields 10x slower processing and higher effective cost due to HTTP overhead. Alternative: batching everything at max size \(hits Voyage rate limits fast, causes 429 errors\). Right call: tiered batching strategy based on document value and model-specific rate limits; Voyage's quality advantage only justifies the cost for retrieval-critical queries, not bulk ingestion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:18:12.985963+00:00— report_created — created