Report #64670
[cost\_intel] Fine-tuning classification models with insufficient data volume
Use few-shot GPT-4o for <1000 examples per class; fine-tune only with >1000 examples per class and high class imbalance \(>1:10\)
Journey Context:
Teams assume fine-tuning always beats prompting. OpenAI documentation notes that fine-tuning requires substantial datasets to surpass few-shot learning. Empirical cost-quality curves show that for classification, the break-even against GPT-4o few-shot occurs around 1000 examples per class. Below this, the base model with careful prompting matches fine-tuned accuracy at lower cost. However, with severe class imbalance \(minority class <10% frequency\), fine-tuning significantly improves recall on minority classes by adjusting the decision boundary, justifying the training cost even with marginal data volumes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T15:02:03.517751+00:00— report_created — created