Report #59617
[cost\_intel] Structured extraction tasks routed to frontier models when small models match within 5%
Route structured extraction \(NER, sentiment, key-value extraction, classification\) to Haiku 3.5 or Gemini Flash when the mapping from input to output is LOCAL — determinable from a contiguous span under 500 tokens. Reserve Sonnet/Pro for GLOBAL extraction requiring cross-paragraph or cross-document reasoning. Cost delta: ~5-10x \($0.25/M vs $3/M input tokens for Haiku vs Sonnet\).
Journey Context:
On local-mapping extraction tasks, Haiku 3.5 matches Sonnet within 2-5% F1. The cliff is sharp and predictable: small models drop 15-30% on tasks requiring multi-hop reasoning \(e.g., 'which company acquired the startup mentioned in paragraph 3'\). The degradation signature is confident-but-wrong extractions on globally-dependent fields, not random noise. People over-provision because they test on their hardest examples and assume uniform difficulty, but 70%\+ of extraction volume is typically local-mapping. Test both models on a representative sample — if the small model's errors cluster on globally-dependent fields, route only those to frontier.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:33:27.974651+00:00— report_created — created