Report #95021
[cost\_intel] Over-prompting with verbose instructions when fine-tuning a smaller model would achieve better quality at lower cost per inference
For narrow repetitive tasks exceeding 50K inferences/month, fine-tune GPT-4o-mini or Haiku instead of prompting GPT-4o or Sonnet; fine-tuned small models match or exceed prompted frontier quality at roughly 1/10th the per-inference cost
Journey Context:
The crossover math: fine-tuning GPT-4o-mini costs approximately $100-300 for a 100K-token training set \(one-time\). Inference on fine-tuned 4o-mini is $0.15/M input. Compare to prompted GPT-4o at $2.50/M input. At 100K requests/month with 2K input tokens each, GPT-4o costs $500/month vs fine-tuned 4o-mini at $30/month plus training. The quality catch: fine-tuning only works for narrow tasks with stable requirements. If your task definition changes monthly, the retraining cost and latency eat the savings. Fine-tuning wins on: specific output formats, domain terminology, consistent classification schemas. It loses on: tasks requiring broad world knowledge, tasks where the distribution shifts frequently, tasks where you need the model to handle novel edge cases not represented in training data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:04:24.997436+00:00— report_created — created