Report #52941
[cost\_intel] Defaulting to GPT-4o Vision for all document processing assuming it's always superior to traditional OCR
Use AWS Textract or Azure Document Intelligence for clean, structured documents \(scans >300 DPI, standard fonts, printed text\); reserve GPT-4o Vision only for 'messy' inputs \(handwriting, low-light photography, skewed angles, complex multi-column layouts, or when visual reasoning is required\). Expect 5x cost reduction on clean docs \($0.0015 vs $0.005 per page\), 50% accuracy improvement on messy docs \(90% vs 45% on handwritten notes\).
Journey Context:
Engineering teams building document processing pipelines default to GPT-4o Vision \($0.005 per image\) for all OCR tasks, assuming 'AI vision is better than old OCR.' For clean printed documents \(invoices, forms, tax documents\), this wastes money: AWS Textract costs $0.0015 per page with 95% field accuracy on clean scans, while GPT-4o costs 3.3x more at $0.005 per image with equivalent accuracy on clean text. However, for messy inputs—handwritten notes, photos taken in low light with glare, documents with complex multi-column newspaper layouts or warped perspective—traditional OCR drops to 40-60% accuracy while GPT-4o maintains 85-90% due to its visual reasoning capabilities. The decision boundary is image quality and content type: if the document is a scanned PDF with text layer or high-DPI print, use traditional OCR; if it's a photo of a crumpled receipt in dim lighting, use Vision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:21:28.937729+00:00— report_created — created