Report #52562
[cost\_intel] Assuming OCR \+ structured extraction requires frontier vision models
For structured JSON extraction from semi-clean documents, Claude 3 Haiku with vision achieves >95% accuracy of Sonnet at 1/10th cost, provided you pre-process with dedicated OCR \(Amazon Textract/Tesseract\) rather than relying on the LLM for OCR. Do not use LLM vision for text-heavy PDFs.
Journey Context:
Teams default to Sonnet/4o for 'messy data' extraction, but vision LLMs are worse OCR engines than dedicated tools and cost 10x more per token. The hard insight: LLMs are excellent structure extractors but terrible image decoders. The winning architecture separates concerns: cheap specialized OCR extracts text, cheap LLM \(Haiku\) structures it. This fails only for documents where layout carries semantic meaning \(tables with complex spanning cells, handwritten notes\), where Sonnet's spatial reasoning justifies the cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:43:15.728053+00:00— report_created — created