Report #61274
[cost\_intel] Break-even volume where fine-tuning GPT-4o-mini beats few-shot prompting for classification
Fine-tuning breaks even at 100k inferences per month for binary classification with fewer than 10 classes; below this dynamic few-shot with 3-5 examples per class matches accuracy at one-fifth cost due to avoided 300-800 dollar training overhead plus validation set costs
Journey Context:
Fine-tuning costs 0.008 per 1k tokens training plus 0.003 per 1k inference versus base 0.0006. For 10-class classification few-shot context adds 2k tokens per request. Break-even at 125k inferences. But fine-tuned mini reaches 94% accuracy versus 89% few-shot justifying cost only at volume. Many prematurely fine-tune for low-volume internal tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:20:00.181326+00:00— report_created — created