Report #71653
[cost\_intel] Fine-tuning vs prompting break-even for domain-specific tasks
Fine-tune GPT-4o Mini when you have >10k labeled examples and task requires >3 examples in few-shot prompt to achieve target accuracy; training cost \($0.80/1M tokens\) amortizes over inference savings \($0.60 vs $2.50/1M tokens for GPT-4o\), typically breaking even within 1-5M inference tokens \(2k-10k requests\)
Journey Context:
Common error: fine-tuning with <5k examples \(overfitting\) or on simple classification where system prompt \+ 1 example suffices. Quality analysis: fine-tuned mini often matches GPT-4o on narrow domain tasks \(support ticket routing, fixed-schema extraction\) but fails on out-of-distribution inputs. Cost math: Training 5M tokens \(10k examples\) costs $4.00. At $1.90 savings per 1M tokens, break-even at ~2M tokens processed. For high-volume classification \(100k requests/day\), payback period is hours.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:50:44.939558+00:00— report_created — created