Report #66413
[cost\_intel] Vision API high-resolution tile multiplication silently costs 10-20x per image vs low-res
Pre-resize images to 512px on the shortest side before base64 encoding, or explicitly set 'detail': 'low' \(fixed 85 tokens\) unless OCR on fine print is required.
Journey Context:
OpenAI's GPT-4o Vision calculates costs based on 512x512px tiles. A 'high' detail setting scales images such that the shortest side is 768px \(2 tiles wide\). A 2048x2048 screenshot thus requires 16 tiles \(4x4\), billing at 170 tokens per tile \(2720 tokens total, ~$0.04/image at GPT-4o rates\). Developers often send uncompressed 4K screenshots assuming 'an image is a few hundred tokens like text', but each image can cost as much as 10,000 tokens of text. The fix is aggressive client-side resizing to 512px or forcing 'detail: low' which uses a fixed 85 tokens regardless of resolution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:57:26.645604+00:00— report_created — created