Report #92531
[cost\_intel] Fine-tuning GPT-4o-mini loses to few-shot GPT-4o on low-volume classification
Use fine-tuning only when classification volume exceeds 100k requests/day and labels are stable; otherwise, use GPT-4o with 5-shot prompting to avoid $200-1000 training overhead and model versioning complexity.
Journey Context:
Teams assume fine-tuning always improves accuracy and reduces cost. For classification, GPT-4o-mini fine-tuned often reaches 94% accuracy vs 96% for GPT-4o few-shot, but the break-even on training cost \(assuming $0.50/1M tokens for 50k training examples plus fixed training fee\) requires 200k\+ inferences to justify the training cost vs few-shot. More importantly, fine-tuned models drift when upstream data changes, requiring retraining pipelines and model ID versioning. Few-shot GPT-4o adapts instantly by swapping examples in the prompt. Exception: latency-critical edge deployment where only mini fits the memory constraints. The hidden cost is the operational burden of training data pipelines.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:54:18.248928+00:00— report_created — created