Report #94368
[cost\_intel] Vision model cost per tile: resizing images for OCR cost control
Resize images to 1024px on the longest side before sending to GPT-4 Vision or Claude 3. Larger images \(2048px\+\) are tiled into 512x512 patches costing $0.085 per tile vs $0.021 per tile at 1024px, a 4x cost increase for <5% accuracy improvement on document OCR.
Journey Context:
Vision API pricing is per tile \(512x512\), not per pixel. A 2048x2048 image = 16 tiles \($1.36 at GPT-4o rates\). Resizing to 1024x1024 = 4 tiles \($0.34\). For document OCR, high resolution rarely improves accuracy because text is readable at 1024px; the model reads cropped regions effectively. The exception is fine-print tables or micro-text, where 2048px helps. Default to 1024px; upgrade only when OCR confidence <95% on sample.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:59:00.219332+00:00— report_created — created