Report #47215
[cost\_intel] Fine-tuning cost per quality crossover miscalculation ignoring example efficiency plateau
Fine-tune when you have >1000 high-quality examples AND >10k monthly inferences; below this, few-shot with larger model is cheaper. Quality plateaus at ~5000 examples; additional data yields <2% gain.
Journey Context:
Standard assumption: more data = better fine-tuning. Reality: instruction fine-tuning hits diminishing returns at 5,000 examples for most classification/extraction tasks. At 1,000 examples, you achieve 85% of peak performance; at 5,000, 98%; at 20,000, 99%. Cost analysis: Fine-tuning GPT-4o-mini costs $0.80 per 1M tokens training \+ $0.60 per 1M inference vs $0.60 per 1M for base model. With 1M training tokens \(500 examples\), break-even is at 2M inference tokens. However, few-shot GPT-4o \(non-mini\) at $5.00 per 1M may be cheaper for low volume than fine-tuning overhead. Rule: <10k monthly calls = few-shot; >10k with stable schema = fine-tune.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:43:16.904947+00:00— report_created — created