Report #75491
[cost\_intel] At what monthly volume does fine-tuning GPT-3.5 become cheaper than few-shot GPT-4 prompting?
Fine-tune when: \(1\) Monthly volume >100k requests, \(2\) You can eliminate >70% of few-shot examples \(token reduction\), \(3\) Task requires specific output format \(JSON mode reliability\). Training cost is $0.008/1k tokens; inference is 2x base GPT-3.5. Break-even typically 50k-100k requests/month.
Journey Context:
Example: You have a classification task requiring 10 examples \(2k tokens\) in the prompt. GPT-4 costs $0.03/1k output tokens. Fine-tuned GPT-3.5 costs $0.006/1k output but you saved 2k input tokens per request. At 100k requests/month: GPT-4 cost = 100k × $0.06 \(2k in \+ 500 out avg\) = $6,000. Fine-tuned: Training ~$50 \(1M tokens\) \+ Inference 100k × $0.018 \(200 in \+ 500 out\) = $1,800. Savings $4,000/month. However, if volume is only 10k/month, training cost dominates and GPT-4 is cheaper. Critical error: fine-tuning when the task requires general reasoning \(fine-tuned models lose general capabilities\) or when prompt compression isn't possible \(dynamic retrieval contexts can't be baked into weights\). Also, fine-tuned models often require higher temperature for same diversity, increasing retry rates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:18:35.213428+00:00— report_created — created