Report #92105
[cost\_intel] When does fine-tuning GPT-4o-mini beat prompting GPT-4o for high-volume classification?
Fine-tune mini when you have >50k examples, need <100ms latency, and task is narrow \(single label, <20 classes\); expect 10x cost reduction and 2x latency improvement over GPT-4o with <3% accuracy drop.
Journey Context:
Teams assume bigger model = better accuracy always. For narrow classification \(sentiment, intent, topic\), fine-tuned small models often match or beat zero-shot large models. The economics: GPT-4o costs $2.50/MTok input, GPT-4o-mini costs $0.15/MTok. But the real win is latency and throughput. Fine-tuning adds format adherence \(no JSON mode needed\) and reduces token count by eliminating few-shot examples. At 1B tokens/month volume, GPT-4o = $2,500, fine-tuned mini = ~$150 \+ training cost amortized. Critical constraint: fine-tuning fails on out-of-distribution inputs or tasks requiring reasoning; it memorizes patterns, doesn't reason.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:11:22.360004+00:00— report_created — created