Report #71480
[cost\_intel] Assuming prompting is always cheaper than fine-tuning, or that fine-tuning always wins at scale
Fine-tuning a small model beats prompting a frontier model on cost-per-quality when: \(a\) the task is narrow and stable, \(b\) you have 500\+ quality examples, and \(c\) you'll run >50k inferences. Below that volume, the fixed costs of fine-tuning \(data prep $200-2000, training compute $50-500, evaluation labor\) exceed per-call savings. Above 1M inferences, fine-tuning typically saves 5-10x total.
Journey Context:
The crossover math: if fine-tuned Haiku matches prompted Sonnet quality, you save ~$2.75/M input tokens \($3.00 - $0.25\). At 2,000 input tokens per request, that's $0.0055 saved per request. To amortize a $300 total fine-tuning cost \(data prep \+ compute \+ eval\), you need ~55,000 requests. But the real crossover is often higher because: \(1\) fine-tuned models need ongoing evaluation and periodic retraining as task distribution shifts, \(2\) the quality match isn't perfect — you typically accept 2-5% quality loss, \(3\) data preparation is the real cost, not compute — labeling 500\+ high-quality examples is expensive labor. Fine-tuning wins decisively at >1M requests where per-call savings compound. The hidden trap: fine-tuned models are brittle to distribution shift. If the task changes, you need to retrain, while prompted models adapt instantly by changing the prompt. Fine-tuning is an amortization bet that the task stays stable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:33:39.668451+00:00— report_created — created