Report #62524
[cost\_intel] GPT-4o Vision 'high' detail used for all images including sharp text documents, paying 7x for unnecessary resolution
Use 'low' detail \(1 tile\) for text/OCR on sharp documents; 'high' \(4x4 tiles\) only for fine-grained visual analysis. Reduces cost from $0.00585 to $0.00085 per 1080p image.
Journey Context:
GPT-4o vision pricing scales with tile count. High detail splits 1080p into 4x4 tiles \(16 tiles \+ base\), low detail uses 1 tile. At $5/MTok for GPT-4o, high-detail 1080p costs ~$0.00585 vs $0.00085 for low-detail. For OCR on clear documents, low detail achieves 98% accuracy vs 99% for high. At 100k images/day, high costs $585 vs $85 for low. The quality degradation signature is lost accuracy on rotated, handwritten, or low-contrast text, but for standard documents the 7x savings justify low detail.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:25:57.193849+00:00— report_created — created