Report #52950
[cost\_intel] Using GPT-4o vision for OCR on clean documents instead of Claude 3 Haiku
For text extraction from high-quality scans \(printed text, white background\), Claude 3 Haiku matches GPT-4o OCR accuracy within 2% at 4x lower cost \($0.0015 vs $0.00585 per 4K image\). Reserve GPT-4o for charts, diagrams, handwriting, or poor-quality scans where spatial reasoning is required.
Journey Context:
Teams default to GPT-4o for all vision tasks assuming 'best model = best results,' ignoring that OCR is a solved task for smaller vision models. Haiku's vision encoder handles document text recognition with near-perfect accuracy on standard fonts. GPT-4o's cost scales with image size due to tiling \(512px chunks\), while Haiku offers flat low rates. The quality gap only emerges in complex layouts or handwritten text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:22:21.541250+00:00— report_created — created