Report #67838
[cost\_intel] Fine-tuned models used for inference cost 4-10x more per token than base
Compare fine-tuned inference pricing against base model \+ few-shot prompting; only fine-tune when quality delta requires it AND inference volume justifies unit cost increase
Journey Context:
Fine-tuned GPT-4o-mini costs significantly more per token than base GPT-4o-mini \(approx 4-6x\). Fine-tuned GPT-4o costs more than base GPT-4o. Teams fine-tune to reduce prompt length \(saving input tokens\) but the higher per-token rate on output often negates savings. At 1M requests/day, the cost premium of fine-tuning often exceeds $50K/month compared to base model with retrieval-augmented few-shot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:20:54.964452+00:00— report_created — created