Report #55729
[cost\_intel] Over-provisioning frontier models for structured data extraction and classification tasks
Use Claude 3.5 Haiku or Gemini 2.0 Flash for any task where output conforms to a predefined schema \(JSON extraction, classification, NER, key-value extraction, sentiment analysis\). These models match Sonnet/Pro within 2-5% on accuracy/F1 for well-defined schemas. Cost difference is 10-20x \($0.25/M vs $3-15/M input tokens\). Only upgrade when the task requires judgment about WHAT to extract from ambiguous input.
Journey Context:
Structured extraction is fundamentally pattern matching, not open-ended reasoning. Smaller models trained after 2024 have extensive JSON/formatting training and are highly competent at schema compliance. The quality cliff for cheaper models appears specifically on tasks requiring judgment about ambiguous or conflicting information—not on the extraction mechanics. If your task has clear input-output mapping with minimal ambiguity, the cheap model is the right call. The common error: conflating 'business-critical task' with 'needs frontier model.' A classification that Haiku gets right 94% of the time is also right 96% with Sonnet—the 2% improvement costs 12x more. Only pay for frontier when the task requires reasoning, not pattern matching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:02:10.370877+00:00— report_created — created