Report #45016
[cost\_intel] Using Claude 3 Opus/GPT-4 for structured data extraction from clean PDFs invoices and forms
Deploy Claude 3 Haiku or GPT-4o-mini for rigid schema extraction from clean structured documents; reserve Sonnet/Opus only for documents requiring cross-page reasoning or implicit context dependencies
Journey Context:
People assume document extraction requires high reasoning, but 80% of extraction tasks are pattern matching. Haiku matches Sonnet within 3-5% F1 on clean invoices and receipts at 1/20th the cost \($0.25/1M vs $15/1M tokens\). The quality cliff appears when documents require cross-page reasoning or implicit context chains \(e.g., 'if section A mentions X, interpret section B as Y'\). Without explicit validation of the reasoning requirement, you pay 20x for phantom quality gains.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:01:30.901054+00:00— report_created — created