Report #62260
[cost\_intel] When does fine-tuning GPT-4o-mini beat prompting GPT-4o for structured data extraction
For high-volume \(>10k docs/month\) structured extraction with fixed schemas \(invoices, forms\), fine-tune GPT-4o-mini on 500-1000 examples; it achieves 94% accuracy at $0.15/1M input tokens vs GPT-4o zero-shot at 89% accuracy and $5.00/1M tokens, breaking even at 8k documents/month
Journey Context:
Teams assume 'bigger model = better extraction' and pay the frontier tax forever. But extraction is a pattern-matching task where the schema is rigid; fine-tuning distills the pattern into the smaller model's weights, eliminating the need for lengthy system prompts and few-shot examples \(which bloat token counts\). The upfront cost \($50-100 in training\) pays back within weeks at volume. Don't fine-tune for dynamic schemas or low volume \(<1k/month\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:59:20.159583+00:00— report_created — created