Report #20990
[cost\_intel] Fine-tuned GPT-4o-mini beats GPT-4o few-shot on cost per quality for extraction tasks
Fine-tune GPT-4o-mini when you have >500 labeled examples, fixed output schema \(<10 JSON keys\), and task is extraction/classification \(not generation\); it achieves 95% of GPT-4o few-shot accuracy at 15% of the cost after amortizing training.
Journey Context:
Teams reach for few-shot GPT-4o for entity extraction, assuming 'frontier model = best extraction.' This ignores that extraction is pattern compression, where fine-tuned small models outperform generalist few-shot prompting. GPT-4o-mini fine-tuned \($0.60/1M output\) vs GPT-4o few-shot \($15.00/1M output with 2k token examples in context\). The error is 'example bloat'—few-shot requires 3-5 examples \(1.5k tokens\) per request, while fine-tuned uses zero examples. For 2k input documents, GPT-4o costs $0.0345 per doc \(input\+output\+examples\), fine-tuned mini costs $0.0018 per doc—a 19x difference. Training cost \($30-100\) amortizes over ~7k requests. The caveat: fine-tuning fails on out-of-distribution inputs; if your extraction schema changes weekly, few-shot wins.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:38:36.436967+00:00— report_created — created