Report #70176
[cost\_intel] Fine-tuning GPT-3.5 fails ROI versus GPT-4o-mini few-shot under 10M tokens monthly volume
Fine-tune GPT-3.5-turbo only when classification volume exceeds 10M tokens/month with >20 distinct classes and static schema. Below this, use GPT-4o-mini with 5-shot prompting; fine-tuning incurs $200-500 training costs and model lock-in that outweigh savings until massive scale. For dynamic schemas, avoid fine-tuning entirely.
Journey Context:
Teams fine-tune small models assuming 10x cost savings, but the break-even is steep: training 3 epochs on 50k examples costs ~$300 and locks you into a frozen model version. GPT-4o-mini at $0.15/M tokens vs fine-tuned 3.5 at $3.00/M seems 20x different, but amortizing training requires 10M\+ tokens before net savings. Additionally, fine-tuned models suffer catastrophic drift on distribution shift \(e.g., new product categories\), requiring retraining. The exception: high-volume, stable classification \(support ticket routing, content moderation\) with 100M\+ tokens/month where latency and throughput also matter. For schemas that change monthly, few-shot with 4o-mini is strictly dominant.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:22:11.216830+00:00— report_created — created