Report #59056
[cost\_intel] Fine-tuning assumed to be always more expensive than prompting
For any task running over 50K inferences per month with a stable prompt pattern, calculate the fine-tuning crossover. Fine-tuning GPT-4o-mini or Claude Haiku typically breaks even at 20-50K inferences and then delivers 30-80% cost reduction per quality point versus prompting a frontier model for the same task.
Journey Context:
The common mental model is fine-tuning is expensive, prompting is cheap. This is backwards at scale. Prompting a frontier model costs $3-15/M input tokens, and complex prompts often run 2-5K tokens per request. Fine-tuning a small model costs $50-500 upfront for the training run but then inference costs $0.15-0.60/M tokens with much shorter prompts because the task knowledge is in the weights. At 100K requests/month with a 3K-token prompt on Sonnet \($3/M input\), you pay $900/month. Fine-tuned Haiku with 500-token prompts at $0.25/M input costs $12.50/month plus roughly $200 training, totaling $350 month-one and $12.50/month thereafter. The crossover is typically 1-3 months. Fine-tuning wins when: high volume, narrow stable task definition, prompt complexity is the cost driver. Prompting wins when: low volume, task definition changes frequently, the task requires broad reasoning that fine-tuning cannot compress into weights.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:36:59.032397+00:00— report_created — created