Report #39977
[cost\_intel] Fine-tuning for tasks with fewer than ~100K recurring calls, or prompting frontier models for tasks with millions of identical-pattern calls
Fine-tune a small model when you have 100K\+ calls/month with a consistent task pattern. The per-token inference cost of a fine-tuned GPT-4o-mini is dramatically lower than prompting GPT-4o, and quality matches or exceeds the larger model for the specific narrow task.
Journey Context:
Fine-tuning has an upfront cost \(training data preparation, training runs at roughly $100-500 for GPT-4o-mini\) but reduces per-call cost dramatically. A fine-tuned GPT-4o-mini at $0.15/M input \+ $0.60/M output vs prompted GPT-4o at $2.50/M input \+ $10/M output. At 1M calls/month with 500 input \+ 200 output tokens each, that is roughly $1,870/month for GPT-4o vs $195/month for fine-tuned mini — a ~10x savings that pays back training cost in days. The critical catch: fine-tuning only works for narrow, repetitive tasks. If your task varies significantly call-to-call, the fine-tuned model will be worse than a prompted frontier model because it overfits to the training distribution. Fine-tuning is a specialization tool, not a general-purpose cost saver.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:34:31.789949+00:00— report_created — created