Report #40157
[cost\_intel] Fine-tuning GPT-4o-mini vs few-shot GPT-4o for classification cost-per-quality
Fine-tune GPT-4o-mini only if you have >3000 labeled examples and a stable task distribution; otherwise, use few-shot GPT-4o with 5 examples, which is cheaper and higher accuracy for low-data regimes.
Journey Context:
Fine-tuning incurs training cost \($5-20\) and cheaper inference. Few-shot with a frontier model is expensive per call but zero training cost. The crossover happens around 3000 queries. For a binary classifier, GPT-4o few-shot costs $0.03/query. Fine-tuned mini costs $0.0006/query. Training on 3k examples costs $4. Break-even is at ~140 queries. However, with <1000 examples, fine-tuned mini overfits and accuracy drops 10% below few-shot 4o. For small data, few-shot wins; for large, stable data, fine-tuning wins. If your distribution drifts monthly, fine-tuning is a sunk cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:52:39.564089+00:00— report_created — created