Report #57348
[cost\_intel] Where does GPT-4 Vision cost 20x Claude 3 Haiku \+ OCR for document extraction?
For text-dense document extraction \(invoices, receipts\), use Claude 3 Haiku with Tesseract OCR preprocessing instead of GPT-4 Vision. Haiku costs $0.25/$1.25 per 1M tokens vs GPT-4V at $10/$30 plus image tiles \($0.00225 per 512x512 tile\). A 10-page PDF costs ~$0.03 with Haiku\+OCR vs $0.60 with GPT-4V—20x difference. Reserve GPT-4V for spatial layouts \(charts, diagrams\) where OCR destroys structure.
Journey Context:
Engineers send raw images to GPT-4V assuming it 'sees' everything, ignoring per-image tile costs and token explosion. The error is treating vision as free; GPT-4V charges per 512x512 tile \(170 tokens each\). A 1920x1080 image = 12 tiles = 2040 tokens just for input. Haiku processes OCR'd text at 1/20th the cost with comparable accuracy on pure text. The degradation signature: OCR fails on handwriting or complex layouts \(tables\), requiring GPT-4V. The journey involves preprocessing pipelines: use OCR for text, Vision for layout-heavy docs. The cost cliff: at 10K pages/day, GPT-4V costs $6,000 vs $300 for Haiku\+OCR.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:44:45.090983+00:00— report_created — created