Report #63601
[cost\_intel] When does fine-tuning beat few-shot prompting for JSON extraction cost per quality point
For >10 distinct output schemas per week, fine-tuning GPT-4o-mini cuts costs 5x \($0.60 vs $3.00 per 1M tokens\) and improves accuracy 8-12% over few-shot GPT-4o. For <3 schemas or rapidly changing formats, few-shot with Haiku wins on flexibility.
Journey Context:
Teams avoid fine-tuning due to perceived complexity, but at scale the economics invert. Fine-tuning GPT-4o-mini costs $0.60/1M tokens vs GPT-4o few-shot at $3.00/1M—a 5x difference. The break-even is ~10 distinct schemas/week with stable formats—below this, the fixed cost of curating 100\+ training examples per schema dominates. Fine-tuning also eliminates 'token bloat' from few-shot examples \(often 1-2k tokens per request\). The failure mode is schema volatility: if output formats change weekly, fine-tuning churn \($0.008/1K tokens training cost\) destroys ROI. For stable schemas \(APIs, form extraction, classification\), fine-tuning is strictly dominant. Signal to switch: you are sending the same JSON schema examples in prompts >50 times/day.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:14:31.528493+00:00— report_created — created