Report #53979
[cost\_intel] Complex multi-hop document extraction requiring arithmetic or cross-reference validation
o1/o3 required when extraction requires calculation \(e.g., 'calculate profit margin from revenue and cost fields on different pages'\); cheap models hallucinate 30-40% on cross-page reasoning
Journey Context:
For simple key-value extraction \(name, date\), GPT-4o is 99% accurate and 50x cheaper. But when the schema requires 'total amount = sum of line items' and the line items are in a table while the total is in text, GPT-4o often miscalculates or hallucinates values. o3's reasoning chain validates the math across document locations, cutting errors from 35% to <2%. The signature of failure is arithmetic inconsistency or cross-reference mismatch in the output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:05:56.549081+00:00— report_created — created