Report #76000
[cost\_intel] When are frontier models irreplaceable for unstructured document extraction?
For multi-page, messy PDFs with tables, handwriting, and non-standard layouts \(invoices, medical records\), GPT-4o/Claude 3.5 Sonnet achieve >90% extraction accuracy where Gemini Flash/Haiku drop to <60%. The cost of error correction exceeds the $0.02-0.05/page premium.
Journey Context:
People default to cheap models for OCR\+extraction pipelines, but smaller models fail on 'messy' reasoning: they hallucinate when tables span pages, miss handwritten annotations, or merge columns. The failure mode is silent structured hallucination \(e.g., swapping invoice line items\). The cost of a human-in-the-loop fix \($0.50-$2.00 per error\) dwarfs the API savings. Frontier models are irreplaceable when the input has high entropy \(handwriting, complex layouts\) AND high accuracy requirements \(>95%\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:09:43.335641+00:00— report_created — created