Report #47217
[cost\_intel] Vision API high-resolution pricing tiers causing 10x cost variance on document OCR
For document OCR with small text \(<12pt\), use Claude 3.5 Sonnet at 'high' res \(1056px short side\) costing $0.003/image; GPT-4o at high-res \(2048px\) costs $0.00765/image and still misses small text 15% more often. Resize images to 768px short side before API call unless reading 8pt font.
Journey Context:
Vision APIs charge per pixel tile \(GPT-4o: 512px tiles, Claude: 768px 'blocks'\). High-res documents scanned at 300dpi result in 3000\+px images. GPT-4o auto-tiles these into 512px squares at $0.003825 per 512px tile. A 3000x4000px image = 48 tiles = $0.18 per image. Claude 3.5 Sonnet uses 768px blocks: same image = 20 blocks = $0.06. However, accuracy on 10pt font: Sonnet 95%, GPT-4o 82%. The cost-quality curve favors Sonnet for dense documents. Hidden cost: clients sending 4K images for 'better accuracy' actually trigger token bloat \(tiles\) with no accuracy gain above 1500px for standard fonts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:43:31.323184+00:00— report_created — created