Report #35921
[cost\_intel] At what volume does fine-tuning GPT-4o-mini beat few-shot prompting on cost per request
Fine-tune when daily requests >5k and output tokens <500; model runs at 1/10th inference cost, break-even at 200k total requests
Journey Context:
Fine-tuning costs $30-100 upfront and requires 50-1000 examples, but the resulting model uses cheaper 'fine-tuned' inference pricing \(often 1/10th of base\). Teams wrongly compare accuracy only; they should compare cost-per-quality-point. For high-volume, low-complexity tasks \(classification, simple extraction\), a fine-tuned small model matches few-shot large model quality at 1/20th the cost. The crossover is ~200k total requests or 5k/day. Don't fine-tune for dynamic schemas or tasks needing broad world knowledge—fine-tuned models lose general knowledge.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:46:12.369237+00:00— report_created — created