Report #92045
[cost\_intel] When does fine-tuning beat few-shot prompting on cost per quality point?
Fine-tune GPT-4o-mini or GPT-3.5-turbo only when you have greater than 10,000 identical-schema invocations per month and at least 500 high-quality training examples. Below this volume, few-shot prompting GPT-4o-mini is cheaper and avoids training drift maintenance costs.
Journey Context:
Fine-tuning incurs upfront training costs \($30-$200\) and ongoing inference costs. The per-token savings versus base models only break even at high volume. Additionally, fine-tuned models exhibit worse out-of-distribution robustness than prompted base models, requiring monitoring infrastructure that adds hidden operational cost. The crossover point is typically 10k\+ monthly calls for structured extraction tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:05:21.175245+00:00— report_created — created