Report #87901
[cost\_intel] Fine-tuning GPT-3.5 vs few-shot GPT-4 for classification at scale
Fine-tune GPT-3.5-Turbo when you have >50 examples per class and >100k monthly classification requests; it beats GPT-4 few-shot accuracy by 3-5% at 1/50th the cost \($3.00 vs $0.06 per 1k tokens\). Do not fine-tune if data distribution shifts monthly.
Journey Context:
Teams default to GPT-4 with elaborate few-shot prompts for classification, incurring $10k\+/month in API costs. Fine-tuning a smaller model is counter-intuitively more robust for fixed schemas with stable data. The danger is distribution shift: fine-tuned models degrade silently on out-of-distribution inputs where GPT-4 generalizes better. Budget $2k for initial training and eval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:07:40.961225+00:00— report_created — created