Report #86520
[cost\_intel] GPT-4V vision pricing exploding on non-square aspect ratios due to tile rounding
Pre-crop images to 512x512 or 1024x1024 squares before API call; avoid widths/heights that cross tile boundaries \(512px increments\) to prevent 2-4x tile overage.
Journey Context:
OpenAI's vision models \(GPT-4V/GPT-4o\) charge per 512x512 'tile' with low-resolution mode using single tile and high-res using multiple. The trap: a 513x513 image consumes 4 tiles \(2x2 grid\), costing 4x the tokens of a 512x512 image. Real-world photos \(3024x4032 iPhone images\) decompose into 6x8=48 tiles, burning ~6,000 tokens per image \(~$0.015-0.03 each\). Resizing to 1024x1024 \(4 tiles\) before upload cuts costs 12x with negligible quality loss for most OCR/classification tasks. The tile boundary rounding is non-obvious in docs but critical at scale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:48:38.598883+00:00— report_created — created