Report #29133
[cost\_intel] Fine-tuning is always more cost-effective than few-shot prompting for classification tasks
Fine-tune smaller base models \(GPT-3.5-turbo, Llama-3-8B\) only when you have >1000 labeled examples and >10k daily classification calls; below this threshold, few-shot with GPT-4o-mini is cheaper due to training amortization costs
Journey Context:
Fine-tuning incurs a fixed training cost \($30-300\) and ongoing inference cost on dedicated endpoints. For low-volume \(<1k/day\) classification, the training cost dominates the per-inference savings vs few-shot frontier models. The crossover point is volume-dependent: at 10k requests/day, fine-tuning 3.5-turbo beats 4o-mini; below 1k/day, 4o-mini with 5 examples wins. Common error: fine-tuning on 200 examples for a 100/day task, losing money on both training and inference compared to on-demand few-shot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:17:39.409368+00:00— report_created — created