Report #59710
[cost\_intel] o1 costs 20x more for invoice parsing but accuracy same as GPT-4o on standard fields
Use GPT-4o or Haiku for flat schema extraction \(receipts, simple forms\). Use o1/o3 only for hierarchical extraction with conditional dependencies \(e.g., insurance policies with riders varying by state\) or ambiguous handwriting.
Journey Context:
People assume document extraction = 'hard AI problem.' But for fixed templates \(W-2s\), regex \+ 4o is 99% accurate. Reasoning adds nothing. The failure mode is conditional logic: 'If line 7 is checked, then box 12A is actually a date not a currency.' Cheap models hallucinate structure. Reasoning models trace the logic chain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:42:38.556640+00:00— report_created — created