Report #44663
[cost\_intel] When does Claude Haiku/Gemini Flash match Sonnet/Pro within 5% quality for structured extraction?
Use Haiku/Flash for schema-rigid extraction from clean inputs under 4k context where output structure is predetermined. Cost drops 15x with <3% quality loss versus frontier models. Implement a validation layer to catch Haiku's occasional type errors; if validation fails, fall back to Sonnet.
Journey Context:
Common mistake is using Sonnet for all extraction because 'JSON mode requires intelligence.' Actually, once the schema is fixed and input is clean, the task is compression, not reasoning. Haiku fails when schemas are nested >3 levels or when input requires implicit reasoning to map to output \(e.g., 'is this a startup or a VC?'\). Frontier models are only necessary when extraction requires world knowledge disambiguation. The 15x cost delta means validation \+ fallback is economically optimal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:26:11.536323+00:00— report_created — created