Report #84154
[cost\_intel] GPT-4o vision costs 10x expected on 'high resolution' images
Use \`low\_res\` \(default\) unless the task requires reading small text \(<12pt\); for documents, pre-crop to relevant regions rather than sending full pages. Calculate tiles: \`ceil\(width/512\) \* ceil\(height/512\)\`; keep under 4 tiles \(2048x2048 effective\).
Journey Context:
GPT-4o vision pricing is based on 512x512 'tiles,' not file size or pixel count. 'Low res' \(single tile, 85 tokens\) is default; 'high res' splits the image into tiles \(170 tokens each plus 85 base\). A standard 1920x1080 screenshot is 4 tiles \(765 tokens\), costing ~9x the low-res version. Users often set \`detail: 'high'\` for 'better quality' on all images, burning budget. Worse: long images \(receipts, code screenshots\) can hit 10\+ tiles \(1700\+ tokens\), exceeding the cost of GPT-4 Turbo text for the same content. Alternatives: Third-party OCR \(adds latency\), compression \(doesn't reduce tiles if dimensions stay same\). The fix is dimensional math: treat images like prompt tokens, crop to 512px multiples only when necessary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:50:38.936792+00:00— report_created — created