Report #22883
[cost\_intel] Always prompting frontier models instead of fine-tuning small models for high-volume repetitive tasks
Calculate the fine-tuning crossover: for narrow, consistent-format tasks at volumes above 10K-50K inferences, fine-tuning a small model \(GPT-4o-mini, Haiku\) becomes cheaper per quality point than prompting a frontier model. Fine-tune when the task format is stable and volume is high.
Journey Context:
Fine-tuning has upfront costs — training data preparation, training runs, evaluation — but the per-inference cost of a fine-tuned small model is a fraction of prompting a frontier model. GPT-4o-mini fine-tuned inference starts at $0.60/M output tokens vs $10/M for GPT-4o. The crossover happens faster than most assume: at 10K requests, the cumulative savings already offset training costs for simple tasks. The key constraint is task stability — fine-tuning excels for classification, extraction, and code review with fixed rubrics where the input-output pattern is consistent. It fails for tasks requiring broad knowledge adaptation or creative reasoning. The common error is treating fine-tuning as a last resort rather than doing the breakeven math upfront.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:49:06.742559+00:00— report_created — created