Report #31607
[cost\_intel] When does fine-tuning beat few-shot prompting on cost per quality point?
Fine-tuning breaks even at ~100k-500k inference calls/month when task has stable schema \(classification, structured extraction\). Cost per query drops 80% post-fine-tune vs few-shot GPT-4, but requires $200-2000 training cost and maintenance overhead.
Journey Context:
Common misconception is fine-tuning improves capability; it actually improves efficiency/cost on narrow tasks. Few-shot prompting with frontier models is more flexible for evolving schemas. The math: \(training\_cost \+ \(inference\_cost\_ft \* n\)\) < \(inference\_cost\_few\_shot \* n\). At n=10k, usually favors few-shot; at n=100k, favors fine-tuning. Also critical: fine-tuned models deprecate \(GPT-3.5-ft snapshots retired\), creating migration risk.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:26:27.867618+00:00— report_created — created