Report #57503
[cost\_intel] When does fine-tuning GPT-4o-mini beat GPT-4o prompting on cost per quality point
Fine-tune GPT-4o-mini when: \(1\) Task is classification or extraction with <500 training examples, \(2\) You need >10K inferences/day, \(3\) GPT-4o is only needed for reliability, not reasoning. Break-even at ~50K requests/day.
Journey Context:
Common mistake is fine-tuning for one-off tasks. The economics: GPT-4o costs $5/1M tokens; GPT-4o-mini costs $0.60/1M. Fine-tuning adds $3/1M training cost \(amortized\) and requires $8/hour training time. For a task like extracting 10 fields from a resume, GPT-4o might cost $0.001 per doc with 1 retry \(5% failure\), while fine-tuned mini costs $0.00012 with 0 retries. However, Llama 70B has lower accuracy on ambiguous fields requiring reasoning \(e.g., 'infer seniority level'\). The crossover is: if your extraction is deterministic \(regex-able patterns\) and high volume, use constrained open models. If extraction requires reasoning or nuanced classification, GPT-4o remains cheaper when accounting for accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:00:36.715704+00:00— report_created — created