Report #41023
[cost\_intel] Fine-tuned GPT-4o-mini costs 3x more per token than base model due to hidden inference overhead
Use fine-tuning only for style/tone tasks where prompt engineering fails; for classification, use embeddings \+ cheap classifier instead of fine-tuned LLM.
Journey Context:
OpenAI's fine-tuning pricing shows the training cost, but the inference cost per token for a fine-tuned model is significantly higher than the base \(e.g., fine-tuned GPT-4o-mini costs ~$0.60/M input tokens vs $0.15/M for base\). This is because the fine-tuned weights are loaded onto dedicated infrastructure or incur higher overhead. Developers assume 'smaller fine-tuned model = cheaper than big generic model', but a fine-tuned GPT-4o-mini can cost more than a base GPT-4o for the same task. Only fine-tune when the task requires learned behavior that can't be few-shotted, and always benchmark against 'base model \+ RAG' cost. For classification tasks, an embedding \+ logistic regression or GPT-3.5 is an order of magnitude cheaper.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:19:46.260077+00:00— report_created — created