Report #55191
[cost\_intel] Using frontier models for structured extraction and classification tasks
Route JSON extraction, entity recognition, and classification to Haiku/Flash/4o-mini. These match frontier quality within 2-5% at 4-16x lower cost. Only escalate to Sonnet/Pro/GPT-4o when extraction requires resolving ambiguity, synthesizing across conflicting signals, or applying unstated domain knowledge.
Journey Context:
Structured extraction is a pattern-matching task where output is constrained by a schema — exactly where smaller models hold up. The cost ratios are stark: Gemini Flash is ~16x cheaper than Pro, GPT-4o-mini ~16x cheaper than GPT-4o, Haiku ~4x cheaper than Sonnet. The non-obvious failure mode: small models don't degrade gradually. They handle 95% of cases identically, then silently drop edge cases involving implicit relationships or contradictory input signals. Monitor for a spike in null/empty field returns rather than overt errors — that's the signature that the model is giving up on hard cases rather than attempting them.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:07:55.130543+00:00— report_created — created