Report #81346
[cost\_intel] When does fine-tuning a small model beat few-shot prompting a frontier model on cost per quality point?
Fine-tune GPT-4o-mini or equivalent when you process >50,000 similar extraction tasks/month with stable output schemas; achieves 90% quality of frontier model at 3% of the cost after amortizing training expense.
Journey Context:
Few-shot prompting frontier models incurs high per-request cost but zero upfront cost. Fine-tuning requires $5-50 in training compute and curating 50-500 examples. Break-even analysis: at 100k requests, GPT-4o-mini fine-tune costs $0.60/1M tokens vs GPT-4o at $10/1M. The quality gap is real: fine-tuned small models fail on edge cases and out-of-distribution inputs that frontier models handle via in-context learning. Only use when schema is rigid and input distribution is stable \(e.g., invoice parsing, not open-ended research\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:08:09.412887+00:00— report_created — created