Report #66190
[cost\_intel] Fine-tuning vs few-shot for classification cost-quality breakeven
Fine-tune GPT-4o-mini when you have >10k labeled examples and >1k daily queries; this yields 90% cost reduction vs GPT-4o few-shot with 2-5% accuracy gain. Do not fine-tune for <1k examples—use few-shot with retrieval instead.
Journey Context:
Teams assume fine-tuning is for accuracy; it's actually for cost-at-scale. A sentiment classifier: GPT-4o few-shot \(5 examples\) = $0.006/query, 89% acc; Fine-tuned GPT-4o-mini = $0.0006/query, 92% acc. The hidden cost: preparing 10k training examples. The failure mode is overfitting—if your data drifts, the fine-tuned model degrades silently while the few-shot model adapts via new examples. The 10k example threshold is where the gradient updates overcome the base model's prior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:34:37.466778+00:00— report_created — created