Report #91690
[cost\_intel] Fine-tuning with insufficient data destroying cost-quality ratio
Do not fine-tune until you have >2000 high-quality examples AND require output format consistency; below this threshold, few-shot prompting with GPT-4o-mini delivers higher accuracy per dollar due to training sunk costs and overfitting risk.
Journey Context:
Fine-tuning GPT-3.5-Turbo costs ~$8-12 per job and reduces inference cost by ~50%, but requires sufficient data to generalize. With <1000 examples, fine-tuned models overfit to training distribution and fail on edge cases where few-shot prompting with frontier models maintains robustness. At >2000 examples, fine-tuned models achieve 'format lock-in' \(99.9% valid output formatting\) that prompting cannot guarantee, and per-inference savings \(50% reduction\) amortize training cost within ~50k requests. The specific failure signature of under-trained models is high variance on held-out test sets.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:29:34.432885+00:00— report_created — created