Report #71429
[cost\_intel] GPT-4o Vision high-res tiling silently consuming 10-30x tokens on 1080p screenshots causing $5/img vs $0.17/img
Force detail: low \(single 512x512 tile, 85 tokens\) for UI screenshots with text >10pt; use high\_res only for details <6px. Calculate tiles via ceil\(width/512\)\*ceil\(height/512\) to predict costs. Pre-resize images to 512px longest edge to enforce low\_res pricing.
Journey Context:
OpenAI charges per 512x512 tile in high\_res mode \(~170 tokens/tile\). A 1920x1080 screenshot becomes 4-8 tiles \(680-1360 tokens\) vs 85 tokens for low\_res. Users assume higher resolution equals better OCR linearly, but for UI text, the 512px thumbnail captures all necessary information. The cost explosion is hidden in token counts. At 1,000 images/day, the delta is $1,700 vs $170/day. Low\_res fails only on fine print \(<8pt\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:28:22.220892+00:00— report_created — created