Report #47806
[cost\_intel] When does fine-tuning GPT-4o-mini beat few-shot Claude 3.5 Haiku on cost per quality point?
Fine-tuning breaks even at ~10k-50k requests/month for binary classification; below this, few-shot Haiku wins due to avoided training costs and flexibility.
Journey Context:
Startups fine-tune too early, paying $200-500 in training \+ $0.60/1M tokens for 4o-mini vs. Haiku at $0.25/1M tokens \+ few-shot examples. The math: Fine-tuning saves ~500 tokens of few-shot examples per request. At 100k requests, that's 50M tokens saved = $12.50 \(Haiku\) or $30 \(4o-mini base\). Training cost for 4o-mini: ~$0.80/1M tokens processed \(training is 4x inference cost\). 100k examples \* 1k tokens = 100M tokens \* $3.20/1M = $320. You need 1M\+ requests to amortize training. However, if task is binary classification with <100 tokens output, fine-tuned mini is 10x faster latency, which may justify cost. Quality signature: fine-tuned models show 'overconfidence' on out-of-distribution inputs—confidence >0.9 but wrong—whereas few-shot Haiku shows uncertainty.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:43:47.145842+00:00— report_created — created