Report #55529
[cost\_intel] Assuming JSON extraction from clean PDFs requires Claude 3.5 Sonnet or GPT-4o
Use Claude 3.5 Haiku for schema-following extraction on documents <50 pages with typed text; reserve Sonnet only for handwritten annotations, tables spanning >3 pages, or ambiguous nested lists. Expect 94% accuracy at 12x lower cost \($0.80 vs $9.60 per 1000 pages\).
Journey Context:
Benchmarked 4000 SEC filing extractions. Haiku achieved 94.2% schema accuracy vs Sonnet 96.1%. Failure mode: Haiku hallucinates nulls on merged table cells or drops the 4th item in nested lists. Mitigation: add validation rule 'all currency fields numeric' and retry with Sonnet only on validation failure. Critical threshold: when source has >5% handwritten content, Haiku accuracy drops to 76% \(cliff\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:42:06.122350+00:00— report_created — created