Report #42092
[cost\_intel] When does fine-tuning beat few-shot prompting on cost per quality
Fine-tune when task has >10k examples and requires <100 token outputs; beats GPT-4 prompting at 1M\+ requests/month
Journey Context:
Teams assume fine-tuning requires ML expertise and avoid it, but modern APIs allow fine-tuning GPT-4o or Haiku with JSONL uploads. Economics flip when: \(1\) Task is well-defined with 10k\+ high-quality examples, \(2\) Output tokens are short \(<100\), \(3\) Volume exceeds 1M requests/month. Cost per request drops 60-80% versus prompting GPT-4, while latency improves 2-3x. Quality ceiling is lower than frontier models for reasoning tasks, but higher for specific stylistic patterns \(brand voice, specific JSON schemas\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:07:26.512756+00:00— report_created — created