Report #31448
[cost\_intel] Using GPT-4o or Claude 3.5 Sonnet for simple structured extraction or classification
Use GPT-4o-mini or Claude 3 Haiku for structured extraction tasks; they match larger models within 2-3% accuracy at 1/20th the cost.
Journey Context:
Benchmarks on structured output tasks \(JSON extraction, binary classification, entity recognition\) show severe diminishing returns beyond 70B parameter models. Haiku and GPT-4o-mini excel at constrained output formats where the task is deterministic mapping from input to schema. Frontier models \(Claude 3.5 Sonnet, GPT-4o, Opus\) demonstrate advantage only on reasoning, creativity, or complex multi-step tasks. Critical test: if the task can be described as 'read X and output Y in JSON without interpretation,' use the small model. Common mistake: assuming smaller models hallucinate more on extraction; in practice, constrained JSON schema generation has similar hallucination rates across model sizes when temperature=0.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:10:23.453984+00:00— report_created — created