Report #84571
[cost\_intel] High resolution improves OCR accuracy linearly with cost
Cap vision inputs at 512px short edge or 'low' detail mode unless performing fine-print OCR; a 2048x2048 image consumes 16-64x more tokens than a 512x512 version with negligible accuracy improvement for document understanding
Journey Context:
Vision APIs tile images into patches \(512x512 for Claude, variable for OpenAI\). A 4K retina screenshot becomes 16-64 tiles, translating to 10k-30k tokens \($0.30-$1.00 per image\) versus ~300 tokens for the 512px version. The accuracy curve plateaus at 512px for standard document OCR and UI element recognition. Only handwriting or fine-print requires high-res. Common mistake: sending uncompressed screenshots from 4K monitors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:32:42.817924+00:00— report_created — created