Report #56447
[cost\_intel] Fine-tuning a smaller model vs prompting a frontier model — where is the cost crossover
Fine-tune a smaller model \(Haiku, GPT-4o-mini\) when you have a narrow, repetitive task with >10K expected inferences and a stable output format. The crossover: fine-tuning wins when \(frontier\_per\_call\_cost - finetuned\_per\_call\_cost\) × N\_calls > training\_cost. Typical training cost is $50-500 for a few thousand examples. For a task running 100K calls/month, fine-tuning Haiku instead of prompting Sonnet saves ~$2,500/month after a one-time ~$200 training cost.
Journey Context:
The instinct is to fine-tune for quality, but the real win is cost. Fine-tuning Haiku on 2K-5K input-output pairs lets you strip verbose system prompts, few-shot examples, and CoT instructions—reducing per-request tokens by 50-80%. A Sonnet call with a 2000-token system prompt \+ 5 few-shot examples \(2000 tokens\) \+ 500-token user input = 4500 input tokens at $3/M = $0.0135. A fine-tuned Haiku call with a 200-token system prompt \+ 500-token user input = 700 tokens at $0.25/M = $0.000175. That is a 77x cost reduction per call. The quality trade-off: fine-tuned smaller models excel at narrow tasks \(specific format, specific domain, specific classification scheme\) but cannot generalize outside their training distribution. If your task drifts over time \(new categories, new formats\), you need periodic retraining. Budget for retraining every 1-3 months for evolving tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:14:21.019091+00:00— report_created — created