Report #38959
[cost\_intel] GPT-4o mini is cheaper than fine-tuning for classification tasks
Fine-tune GPT-4o mini for binary classification at >1M requests/day; break-even at 500k daily requests with 60% lower latency and 40% cost reduction versus few-shot prompting when accounting for prompt token bloat.
Journey Context:
The error is comparing per-token costs without accounting for prompt engineering token bloat. A few-shot classifier might use 2k tokens of examples per request. Fine-tuned model needs 50 tokens of instructions. At scale, the inference cost savings dominate. Additionally, fine-tuned models return valid JSON 99.5% vs 97% for prompted, reducing retry costs and downstream error handling. The break-even calculation must include the amortized training cost \(typically $50-200\) divided by daily volume.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:52:10.985329+00:00— report_created — created