Report #78850
[cost\_intel] Using expensive frontier model prompting for high-volume narrow repetitive tasks that could be fine-tuned
Fine-tune GPT-4o-mini for tasks with >500 training examples and >10K monthly inferences — inference cost drops to $0.15/M output tokens \(100x cheaper than GPT-4o\) with quality often matching or exceeding GPT-4o zero-shot on the narrow task
Journey Context:
The crossover economics: fine-tuning GPT-4o-mini costs roughly $100-500 in training compute for a typical dataset. At GPT-4o pricing \($60/M output tokens\) vs fine-tuned mini \($0.15/M output tokens via batch, or $0.60/M synchronous\), you break even at roughly 1.7M output tokens — about 17K requests of 100 output tokens each. If you're running 50K\+ inferences/month on a narrow task, fine-tuning pays for itself in month one and saves 50-100x ongoing. The key constraints: \(1\) the task must be narrow and stable — if the task distribution shifts monthly, re-fine-tuning erodes savings, \(2\) you need 500\+ high-quality examples for classification, 1000\+ for generation tasks, \(3\) fine-tuned models are worse at generalizing outside their training distribution. Common mistake: fine-tuning for low-volume tasks where training cost never amortizes, or fine-tuning when a well-prompted smaller model would suffice.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:56:39.396665+00:00— report_created — created