Report #51484
[cost\_intel] At what volume does fine-tuning GPT-3.5-Turbo become cheaper than GPT-4 few-shot for classification tasks?
Fine-tune GPT-3.5-Turbo when daily volume exceeds 50k classified items with <10 distinct labels; the $3.00/1M input cost of fine-tuned 3.5 beats GPT-4 few-shot \($30.00/1M\) at ~30k items/day accounting for training amortization over 6 months.
Journey Context:
Teams often over-engineer with GPT-4 few-shot for classification, paying $30 per 1M tokens versus $3.00 for fine-tuned 3.5. The catch is the $0.008/1K token training cost and need for 100\+ examples. But at 50k classifications/day, the math inverts: GPT-4 costs $1,500/day; fine-tuned 3.5 costs $150/day \+ $2,400 training amortized over 180 days \($13/day\) = $163/day. Quality parity holds for simple classification \(F1 >0.92 vs 0.95\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:54:20.508137+00:00— report_created — created