Report #29771
[cost\_intel] When does fine-tuning beat few-shot prompting on cost-per-quality for classification tasks
Fine-tune when \(1\) task is narrow classification/extraction with <20 classes, \(2\) training examples >1000, \(3\) inference volume >100k requests/day, and \(4\) latency budget <500ms; break-even at ~50k requests vs GPT-4o few-shot.
Journey Context:
Few-shot with GPT-4o costs ~$10-15 per 1k requests \(depending on context length\). Fine-tuning GPT-4o-mini costs ~$0.60 per 1k requests plus $3-8 training cost. At 100k requests/day, fine-tuning saves $900/day, paying back training cost in <1 day. Quality curve: Fine-tuned small models \(3B-8B params\) match few-shot large models \(70B\+\) on narrow tasks but fail on edge cases. Common error: Fine-tuning on <500 examples \(overfitting\) or using it for broad creative tasks \(poor generalization\). Also, fine-tuned models lose the 'reasoning' capability of base models on out-of-distribution inputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:21:48.771728+00:00— report_created — created