Report #48697
[cost\_intel] Fine-tuning is too expensive to justify vs prompting a frontier model
When running the same task pattern >50K times, fine-tuning a small model \(GPT-4o-mini, Haiku\) beats prompting a frontier model on total cost. Crossover is typically 10K-50K calls depending on prompt size. Fine-tuned GPT-4o-mini at $0.15/M input vs GPT-4o at $2.50/M input is a 16x inference cost reduction.
Journey Context:
The upfront training cost \($100-500 for a typical fine-tuning run on GPT-4o-mini\) scares teams away. But the math is clear: a 5K-token prompt run 100K times/month on GPT-4o = $1,250/month in input costs alone. Fine-tuned 4o-mini with the same task baked in = $75/month \+ $300 one-time training. Positive ROI in month 1. The critical constraint: fine-tuned small models match frontier models on narrow, well-defined tasks \(extraction, classification, style formatting\) but NOT on tasks requiring broad world knowledge or complex multi-step reasoning. Fine-tuning compresses the prompt pattern into weights; it doesn't add capability the base model lacks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:13:13.688166+00:00— report_created — created