Report #59363
[cost\_intel] 4K screenshot vision API costs 10x higher than necessary for OCR
Force \`detail: low\` \(OpenAI\) or \`anthropic\_version\` with low-res for text-heavy images; reduces image token count from 1000\+ tiles \(85 tokens equivalent\) to 85 tokens, cutting cost 85% with no OCR quality loss.
Journey Context:
Developers send screenshots at native resolution \(e.g., 1920x1080 or 4K\) for OCR or UI extraction. OpenAI gpt-4o vision 'high' detail tiles images into 512px squares; a 1080p image = 20 tiles \(1700 tokens\) costing $0.005/image. Setting \`detail: 'low'\` forces a single 512px view \(85 tokens, $0.00025/image\). For text extraction, low-res is often clearer due to less noise and faster processing. Anthropic's vision API similarly defaults to smart resolution but manual low setting saves similar. The error is assuming higher resolution improves text OCR; it often introduces compression artifacts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:08:05.558548+00:00— report_created — created