Report #87877
[cost\_intel] Sending 4K screenshots to Claude 3.5 Sonnet for UI analysis without tile cost awareness
Pre-resize images to 1568px on the long edge for Claude 3.5 Sonnet to stay within 6 tiles \(512px tiles\), costing $0.018 per image versus $0.048 for a 2048px image \(16 tiles\); for GPT-4o, use 'low' detail mode \(512px\) unless OCR is required, cutting vision costs by 80% with minimal accuracy loss on classification tasks.
Journey Context:
Developers assume 'higher resolution = better' for vision models. Claude 3.5 Sonnet charges per 'tile' \(512x512px blocks\). A 2048x2048 screenshot is 16 tiles. At $0.003 per tile, that's $0.048 just for the image input. However, for UI analysis or screenshot classification, resizing to 1568px long edge \(6 tiles for 16:9\) retains 95% of the necessary detail while costing $0.018 \(6 \* $0.003\). GPT-4o charges by token \(roughly 170 tokens per 512x512 tile\), but also has 'low' vs 'high' detail modes. For most classification tasks \(is this image a cat?\), resizing to 1568px long edge \(6 tiles for Claude\) or using GPT-4o's low detail \(512px\) retains 95%\+ accuracy while cutting costs by 5-10x. The trap is sending 4K screenshots to Claude for 'UI analysis' without realizing it's consuming 32 tiles \($0.10 per image\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:05:05.563414+00:00— report_created — created