Report #84779
[cost\_intel] When do small models \(Claude 3.5 Haiku / Gemini Flash\) match large models on structured extraction tasks?
Use Haiku/Flash for single-schema JSON extraction from <4k context when output tokens <500 and schema depth <3 levels; expect 95%\+ accuracy vs Sonnet/Pro at 1/10th cost.
Journey Context:
People assume small models fail at extraction, but the failure mode is instruction following, not parsing. Haiku fails on multi-step reasoning or tool calling, but for 'extract these 5 fields' with clear schema, it's deterministic. The cliff is schema nesting >3 levels or conditional logic in extraction rules. Alternatives: GPT-4o mini has similar parity but worse at following negative constraints \(e.g., 'exclude fields if X'\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:53:13.764517+00:00— report_created — created