Report #77406
[cost\_intel] When does fine-tuning GPT-3.5 beat GPT-4 few-shot on cost per quality point?
Fine-tune 3.5-Turbo for binary/3-way classification >50k requests/month with <200 training examples; beats GPT-4 few-shot at 1/20th cost after month 2.
Journey Context:
GPT-4 few-shot \(n=3\) costs $30/1M input \+ $60/1M output. Fine-tuned 3.5 costs $3/1M input \+ $6/1M output \+ $0.008/1k training tokens \(amortized\). For 100k calls/month with 1k input tokens each: GPT-4 = $9k/month. Fine-tuned 3.5 = $450/month \+ $8k one-time training = breakeven at 2 months. Quality signature: fine-tuned 3.5 hallucinates less on in-distribution data but fails on distribution shift; GPT-4 generalizes better on edge cases. Use fine-tuning only when input distribution is static \(e.g., support ticket classification\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:31:25.405277+00:00— report_created — created