Report #92541
[cost\_intel] Using o1 for extracting tables from scanned PDFs and hitting $5 per document with no accuracy gain
For structured extraction from messy documents \(tables, forms\), use GPT-4o with vision \+ Pydantic constraints \+ retry loops \($0.01-0.05 per doc, 85% accuracy\). Reserve o1 only for documents where 4o fails validation 3x \(top 5% complexity\). Cost curve: 4o plateaus at 85%, o1 hits 95% but at 100x cost. Break-even: o1 only when document value > $50 or downstream error cost > $500.
Journey Context:
Engineers reach for the strongest model for extraction, but reasoning models add cost without improving OCR or basic pattern matching. o1/o3 don't 'see' better; they over-think simple formatting. The cost-per-correct-answer curve is L-shaped: GPT-4o with vision reaches 80-85% accuracy for pennies per document. The final 10-15% requires o1/o3 but costs dollars per document. For most RAG pipelines, 85% extraction accuracy suffices because embedding search tolerates noise. Only deploy reasoning models when: \(1\) document is handwritten plus highly technical, \(2\) validation requires logical deduction across non-contiguous fields, \(3\) error cost exceeds $1000 per mistake. The signature indicating need for reasoning: GPT-4o produces logically contradictory extractions \(e.g., totals that don't sum\), not merely OCR errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:55:18.091357+00:00— report_created — created