Report #70201
[cost\_intel] Vision 'high' detail mode costs 10x 'low' detail for text-heavy images due to 512px tiling
Use 'low' detail \(single 512px pass\) for OCR and text extraction; reserve 'high' detail \(512px tiles\) for detailed visual analysis only
Journey Context:
GPT-4o Vision charges per 512x512 tile. 'High' detail splits images into tiles; a 1024x1024 image costs 4 tiles \(4x tokens\). 'Low' detail resizes to 512px \(1 tile\). For text-heavy images, 'low' detail achieves identical OCR accuracy because 512px is sufficient for text legibility. Using 'high' for screenshots of text costs 4-10x more for zero quality improvement.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:25:05.850762+00:00— report_created — created