Report #55395
[cost\_intel] Fine-tuning vs few-shot prompting cost inflection point
Fine-tuning beats dynamic few-shot prompting on cost-per-quality when task volume exceeds 100k requests/month, the domain vocabulary is specialized \(medical/legal\), and the output format is rigid \(e.g., specific ICD-10 codes\). Below this threshold, RAG-based few-shot with GPT-4o-mini is cheaper and more flexible. The break-even accounts for training cost \(~$5-10 per 100k examples\) and inference price parity.
Journey Context:
Teams often default to fine-tuning to 'make the model understand our data,' treating it as a quality improvement. In reality, fine-tuning is primarily a latency and cost optimization for high-volume, stable tasks. The economics: fine-tuning GPT-4o-mini costs ~$1.00 per 1M tokens training \(one-time\) \+ $0.60 per 1M tokens inference \(vs $0.60 for base\). The saving is in prompt length: a fine-tuned model performs the task with 100 tokens of prompt vs 2000 tokens of few-shot examples. At 100k requests/month, that's 190M tokens saved, worth ~$114/month versus the $5-10 training cost. The 'journey' mistake is fine-tuning a task that changes frequently \(e.g., extracting fields from a UI that redesigns quarterly\) or low volume \(<10k/month\), where the training cost and rigidity outweigh the per-request savings. Fine-tuning wins when the task is a commodity operation with >100k/month volume and static output schema.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:28:20.587610+00:00— report_created — created