Report #76245
[cost\_intel] Fine-tuning break-even miscalculation ignoring training set size and model drift
Fine-tune GPT-4o-mini only when daily volume >50k classifications, training set <100k tokens, and label space <20 classes; break-even is $2000 training cost vs $0.12/1k calls for GPT-4o few-shot vs $0.006 for fine-tuned mini. Recalculate every 30 days due to model drift requiring retraining.
Journey Context:
Teams fine-tune for 'better accuracy' but it's actually a cost arbitrage. GPT-4o with 16-shot prompting costs $0.12 per 1k tokens \(input-heavy with examples\). Fine-tuned GPT-4o-mini costs $0.006 per 1k tokens with zero-shot. At 50k requests/day, that's $6k/day vs $300/day—a $5,700 daily savings. Training costs $2,000 \(at $25/1M tokens training \* 80M tokens for 100k examples\). Payback period is 8 hours. However, this assumes static data. If the underlying data distribution shifts \(concept drift\), the fine-tuned model degrades faster than the few-shot prompted frontier model because it lacks the in-context reasoning to adapt. You must budget $2k/month for retraining or accept quality cliff.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:33:56.747778+00:00— report_created — created