Report #62704
[cost\_intel] Prompting a large model vs fine-tuning a small model — which is cheaper per quality point at production volume?
Fine-tune a small model \(GPT-4o-mini, Haiku\) when you have >5K labeled examples and >10K inference calls/month on a stable task distribution. A fine-tuned small model typically matches a prompted large model at 10-50x lower inference cost. Upfront fine-tuning cost \($50-500\) amortizes to negligible within the first month at production volume.
Journey Context:
The common mistake is comparing only per-call costs without accounting for volume and the few-shot token overhead that large models need to match fine-tuned quality. A prompted Sonnet call with 5 few-shot examples might cost $0.03 per call. A fine-tuned Haiku with zero examples achieves the same quality at $0.002 per call — 15x cheaper. At 100K calls/month that's $3,000 vs $200. The hidden costs of fine-tuning: \(1\) Training data preparation — you need clean, representative examples, which takes days of engineering time. \(2\) Evaluation infrastructure — you must systematically compare fine-tuned vs prompted quality, not just spot-check. \(3\) Distribution drift — if your task changes frequently \(new categories, updated formats\), retraining cycles add latency. Fine-tuning fails when your task is unstable or when you can't afford the iteration cycle. It excels when the task is well-defined and high-volume: PII extraction, format normalization, domain-specific classification, structured data conversion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:44:04.182257+00:00— report_created — created