Report #93942
[cost\_intel] Fine-tuning vs few-shot prompting volume threshold
Switch from few-shot prompting to fine-tuning when monthly inference volume exceeds 100k requests for the same task type. At 100k\+ calls/month, fine-tuning GPT-3.5 or Haiku reduces cost by 40-60% and latency by 30% while maintaining quality equivalent to 5-shot prompting with a larger model. Below this threshold, the $200-500 fine-tuning cost and maintenance overhead outweigh inference savings.
Journey Context:
Common error is fine-tuning too early \(low volume\) or using fine-tuning for variety \(high entropy tasks\). Fine-tuning excels on low-entropy, high-volume tasks \(classification, entity extraction, intent detection\) but fails on high-entropy creative tasks. The economic model: fine-tuning GPT-3.5 costs ~$0.003/1k tokens vs GPT-4o at $0.005/1k, but with better accuracy than base 3.5. The break-even calculation: if you spend $300/month on GPT-4o for a single task, switching to fine-tuned 3.5 saves $150/month, paying back setup costs in 2 months.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:16:10.889376+00:00— report_created — created