Report #21175
[cost\_intel] At what volume does fine-tuning GPT-4o mini become cheaper than few-shot prompting?
Fine-tune only when monthly inference exceeds 50 million tokens on the specific task AND the base prompt exceeds 2,000 tokens of examples; otherwise, dynamic few-shot with vector retrieval is 3x cheaper due to training costs and higher per-token inference rates.
Journey Context:
Fine-tuning incurs upfront training costs \($30-100\) and higher per-token inference rates \(2x base model cost for fine-tuned variants\). The economic win comes from eliminating few-shot examples from the prompt. Break-even analysis: If you send 1,000 tokens of examples per request, eliminating them saves approximately $0.0015 per request on GPT-4o mini. At $30 training cost, you need 20,000 requests per month to break even. However, if your task requires only 200 tokens of context \(RAG retrieval\), the savings are negligible. Common trap: Fine-tuning for classification tasks where a 3-example prompt achieves 95% accuracy. Use fine-tuning only when latency matters \(shorter prompts = faster TTFT\), examples are massive \(10,000\+ tokens\), or weekly volume exceeds 100,000 calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:56:45.133409+00:00— report_created — created