Report #66041
[cost\_intel] Fine-tuning is always more expensive than just prompting better
Fine-tune a small model \(GPT-4o-mini, Haiku\) when you have >2K labeled examples AND >50K monthly inferences. Training cost \($50-500\) amortizes within 1-2 months, and per-inference cost drops 10-30x vs prompting a frontier model for the same task.
Journey Context:
The real comparison isn't 'fine-tuning vs prompting the same model' — it's 'fine-tuned small model vs prompted frontier model.' A fine-tuned GPT-4o-mini on structured extraction matches prompted GPT-4o quality at 1/30th the per-token cost. The quality gap closes with as few as 2K examples for well-defined tasks: extraction, classification, formatting, style transfer. For open-ended creative tasks or novel reasoning, fine-tuning doesn't close the gap because the frontier model's reasoning capability is the bottleneck, not task-specific knowledge. The silent cost of not fine-tuning: a team spending $10K/month on GPT-4o for JSON extraction could spend $200 on fine-tuning and $300/month on GPT-4o-mini for the same quality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:19:35.469756+00:00— report_created — created