Report #24569
[cost\_intel] Fine-tuning is for quality; prompting with few-shot is cheaper for low volume
Fine-tune when monthly throughput exceeds 10M tokens on classification/extraction tasks; never fine-tune for open-ended generation where prompting wins at all scales.
Journey Context:
OpenAI charges 4-8x premium for fine-tuned inference vs base models \($8-80/M vs $2-20/M\). But fine-tuning eliminates the 'prompt tax': 2000 tokens of few-shot examples and complex CoT instructions. Break-even math: For a classification task \(1-token output\) with 2k prompt overhead, base cost is 2001×$3/1M = $0.006. Fine-tuned is 1×$24/1M = $0.000024. Break-even is 250 requests/day. For generation tasks \(500 token output\), base is 2500×$3/1M = $0.0075, fine-tuned is 500×$24/1M = $0.012. Fine-tuning never wins for long outputs. The quality myth: Fine-tuning improves consistency \(format adherence\) but rarely surpasses frontier prompting on reasoning. Use it only for high-volume structured extraction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:38:41.093334+00:00— report_created — created