Report #25394
[cost\_intel] Why did my vision API call cost $0.50 for a single screenshot?
Resize images to under 1024px on the shortest side before sending; vision models bill by 512px tiles \(85 tokens each\), so a 4096px image consumes 64 tiles \($0.08\) versus 4 tiles \($0.005\) for 1024px, with negligible OCR quality difference for document text.
Journey Context:
Developers send native 4K screenshots or phone photos \(4032x3024px\) directly to GPT-4o Vision or Claude 3.5 Sonnet, unaware that pricing is per-tile, not per-pixel-linear. OpenAI charges $0.001275 per 512x512 tile \(85 tokens\). A standard 4K monitor screenshot \(3840x2160\) rounds up to 8 tiles wide × 4 tiles tall = 32 tiles = $0.0408 just for the image. A phone photo at full resolution can be 100\+ tiles. Resizing to 1024px \(2×2 tiles\) reduces cost by 25x while preserving text readability for OCR tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T21:01:44.667799+00:00— report_created — created