Report #42460
[cost\_intel] Fine-tuning break-even threshold: when does fine-tuning GPT-4o-mini beat few-shot GPT-4o on cost-per-quality for classification?
Fine-tune GPT-4o-mini when you have >10k labeled examples, <10 classes, stable distribution, and >1M inferences/month; delivers 10x cost reduction with <2% accuracy drop vs few-shot GPT-4o.
Journey Context:
Teams default to GPT-4o with 5-shot prompting for classification, paying $2.50/1k calls. At 100k calls/day, that's $250/day. Fine-tuning GPT-4o-mini costs $0.60/1k calls \($60/day\) plus $300 training cost. Break-even is day 2. However, fine-tuning on <1k examples causes overfitting and catastrophic performance on edge cases. Common trap: fine-tuning for generative tasks \(creative writing\) where prompting wins, or fine-tuning without validation leading to distribution shift failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:44:27.885035+00:00— report_created — created