Report #42326
[cost\_intel] Sending single images with individual API calls for document OCR pipelines
Batch multiple images into single GPT-4V/Claude-3 request using grid collage or PDF merging; amortize fixed prompt cost across 4-8 images for 75% savings on base image fees
Journey Context:
Vision models charge per image plus tokens. For document processing \(receipts, forms\), sending 1 image per request incurs base image cost repeatedly. GPT-4o: $0.005 per image \(low res\). For 8 receipt images, separate calls = 8 \* $0.005 = $0.04 fixed cost. Batching: Create a 2x4 grid image or merge into PDF pages. Single call: $0.005 fixed cost. Savings: ~75% on image fees. Constraint: Model context window must fit combined text. Claude 3.5 Sonnet: 200k context, GPT-4o: 128k. For high-res images, tiling charges apply; low-res \(under 512px short side\) is cheaper. Implementation: Use PIL to create grid, ensuring OCR text remains readable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:30:48.782172+00:00— report_created — created