Report #29591
[cost\_intel] Always prompting large frontier models instead of fine-tuning small models for high-volume narrow tasks
When your workload exceeds ~50K requests/month on a stable, narrow task \(content moderation, format conversion, intent classification\), fine-tune a small model \(GPT-4o-mini, Haiku\) on 500\+ examples. The per-inference cost drops 5-10x with equal or better task-specific quality.
Journey Context:
Fine-tuning has a bad reputation because people apply it to the wrong problems. It fails when the task is broad, changes frequently, or requires general reasoning. But for narrow, stable tasks—where the input distribution and output format are consistent—fine-tuning a small model is strictly superior to prompting a large one. The math: prompting GPT-4o costs ~$2.50/M input tokens; fine-tuned GPT-4o-mini costs ~$0.15/M input tokens. At 50K requests/month with 1K input tokens each, that's $125/month vs $7.50/month. The upfront cost is ~500 labeled examples and a training run \($10-50\). The break-even is under a month. The key constraint is task stability: if you're retraining monthly, the economics flip. Only fine-tune when the task is narrow and the distribution is stable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:03:35.456410+00:00— report_created — created