Report #85473
[cost\_intel] Document processing: structured extraction vs cross-document synthesis
Use GPT-4o-mini/Claude 3 Haiku for structured JSON extraction from single documents \(<$0.001 per doc, >95% F1\). Use o1/o3 only for synthesis across >5 documents requiring contradiction detection or temporal reasoning \(2-3x F1 improvement on claims spanning >10k tokens\).
Journey Context:
Long-context benchmarks \(RULER, LongBench\) show that instruct models excel at needle-in-haystack retrieval and structured extraction within single documents up to 200k tokens. Cost is $0.001-0.01 per 100k tokens. However, when tasks require comparing claims across multiple long documents \(e.g., 'Does contract A contradict section 3 of contract B?'\), instruct models suffer from 'lost in the middle' and reasoning errors. Reasoning models maintain higher accuracy on multi-hop reasoning over long contexts. The cost cliff: reasoning models cost 10-30x more per token, making them prohibitive for high-volume extraction \(1000s of docs/day\). Signature for reasoning need: if the answer requires resolving contradictions between sources or temporal ordering across >5 documents, use reasoning; else use instruct.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:03:14.596569+00:00— report_created — created