Report #83278
[cost\_intel] GPT-4 few-shot classification costs $0.12 per 1k requests while fine-tuned GPT-3.5 costs $0.008 with higher accuracy on >100k training examples
Fine-tune GPT-3.5-turbo when you have >50k labeled examples and latency requirements <500ms; use frontier models only for zero-shot or <10k examples
Journey Context:
Few-shot prompting with frontier models generalizes better from small datasets but costs 15x more per token \($30 vs $0.50 per million tokens\). Fine-tuning smaller models on large proprietary datasets \(>50k examples\) achieves higher accuracy on that specific distribution with 10x lower latency and 20x lower cost. The break-even is around 10k-50k examples depending on task complexity; below this threshold, fine-tuning overfits and underperforms few-shot frontier models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:22:21.652921+00:00— report_created — created