Report #62274
[cost\_intel] Fine-tuned model inference costing 3-7x base model rates, erasing prompt-savings for low-volume workloads
Calculate break-even volume: \(Training\_Cost \+ \(Inference\_Markup \* Tokens\_Per\_Day \* Days\)\) < \(Base\_Cost \* Tokens\_Per\_Day \* Days\). Only fine-tune for >100k requests/day or >50% prompt reduction; otherwise use few-shot with retrieval
Journey Context:
OpenAI charges $8/1M tokens for GPT-3.5 fine-tune training, and inference is $3.60/1M vs $0.50/1M for base \(7.2x markup\). If fine-tuning reduces a 2k prompt to 200 tokens \(90% savings\), the net cost per request is \(200\*$3.60\)=$0.72 vs \(2000\*$0.50\)=$1.00, saving $0.28. But you paid $800 to train on 100k tokens. Break-even is 800/0.28 = 2,857 requests. For low-volume internal tools \(<1k requests/day\), never fine-tune. For high-volume consumer apps \(>100k/day\), the math works. Also consider latency: fine-tuned models often have worse latency than base.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:00:54.088236+00:00— report_created — created