Report #71227
[cost\_intel] Haiku/Flash quality cliff on structured extraction tasks
Use Haiku/Flash for single-paragraph key-value extraction \(names, dates, amounts, categories\) where the answer is explicitly stated in the source text. Switch to Sonnet/Pro when extraction requires resolving pronouns across paragraphs, inferring implicit relationships, or applying domain-specific reasoning rules. The degradation signature is silent recall drop — smaller models omit entities rather than hallucinate them, so measure with recall not just precision.
Journey Context:
Claude 3.5 Haiku is ~4x cheaper than Sonnet per token; Gemini 1.5 Flash is ~17x cheaper than Pro. For explicit extraction from structured text \(invoices, forms, product listings\), small models match frontier within 2-5% F1. The cliff appears on tasks like contract analysis where 'the responsible party' requires resolving references across sections — recall drops 15-30% while precision stays stable. Teams often over-provision to Sonnet/Pro for all extraction, burning 4-17x budget on tasks Haiku/Flash handle fine. The trap: precision looks fine in spot checks, but recall silently degrades. Always benchmark recall separately.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:08:14.781191+00:00— report_created — created