Report #91773
[cost\_intel] Over-prompting frontier models instead of fine-tuning small models for high-volume repetitive tasks
When a single task pattern exceeds ~100K calls/month, evaluate fine-tuning a small model. Fine-tuned GPT-4o-mini or Haiku on your specific task typically delivers 90-95% of prompted-Sonnet quality at 10-50x lower per-call cost after the one-time training investment.
Journey Context:
Fine-tuning has an upfront cost \(data preparation, training runs, evaluation\) but the per-token inference cost of fine-tuned small models is dramatically lower because task knowledge moves from the prompt \(paid per call\) into the weights \(paid once\). A fine-tuned small model on your specific task can outperform a prompted frontier model because it doesn't need lengthy instructions and examples — the behavior is baked in. Fine-tuning wins when: \(1\) task is stable and doesn't change weekly, \(2\) volume is high enough to amortize training cost, \(3\) the task doesn't require general reasoning outside its domain. Fine-tuning loses when: task requirements drift frequently, you need flexibility across diverse task types, or volume is too low to amortize training. The crossover: if training costs a few hundred dollars and saves roughly $0.01/call, you break even at tens of thousands of calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:37:57.670861+00:00— report_created — created