Report #58840
[cost\_intel] Frontier models used for structured extraction where small models match quality within 2-5%
Default to Haiku/GPT-4o-mini/Gemini Flash for structured extraction tasks with clear schemas and well-formatted input. These are 10-20x cheaper. Escalate to frontier models only when you observe the degradation signatures: hallucinated field values, dropped optional fields, or failures on ambiguous/contradictory source text.
Journey Context:
Structured extraction from clear schemas \(JSON from receipts, log parsing, form field extraction\) is fundamentally a pattern-matching task, not a reasoning task. Smaller models have been heavily trained on JSON output and handle this well. The quality cliff has specific signatures to watch for: \(1\) When source text is ambiguous or contradictory, smaller models hallucinate values rather than flagging uncertainty — frontier models more often say 'not specified'. \(2\) When extraction requires implicit domain knowledge not stated in the schema, smaller models miss it. \(3\) When the schema has deeply nested arrays/objects, smaller models sometimes flatten or misstructure the output. The common mistake is over-provisioning the model 'just in case' — instead, run a 500-example evaluation set on both model tiers and measure the actual gap. For well-defined extraction from clean input, the gap is typically 1-3%. For messy, ambiguous input, the gap can widen to 10-20%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:15:06.876918+00:00— report_created — created