Report #78329
[cost\_intel] Sending 4K uncompressed screenshots to GPT-4o Vision, paying $0.01 per image instead of $0.001 for pre-resized 512px images with no quality loss for UI extraction
Pre-resize images to 512px or 1024px short-edge before sending to vision APIs; GPT-4o and Gemini bill by tile \(512px blocks\), so 4K images cost 16x more than necessary for most UI/code extraction tasks
Journey Context:
Vision APIs charge per 'tile' \(512x512px for GPT-4o\). A 4096x4096 image is 64 tiles; a 512x512 is 1 tile. For text extraction or UI element location, downsampling to 1024px \(4 tiles\) preserves all necessary information. Common error: sending raw screenshots from 4K monitors without realizing the cost scales with pixel count, not information content.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:04:01.671477+00:00— report_created — created