Report #69124
[cost\_intel] Fine-tuning vs prompting break-even at 50k examples
Fine-tune small models \(GPT-3.5-turbo/Haiku\) only when you have >50,000 training examples, schema is stable, and task requires <100ms latency. Below this threshold, few-shot frontier models are cheaper when accounting for training fixed costs \($500-2000\).
Journey Context:
The naive math: Fine-tuned 3.5-turbo costs $0.003/1K tokens vs GPT-4 at $0.03/1K—10x cheaper per token. But training 50k examples costs ~$1000-2000 in API fees. At 10k inference requests/day, you save $0.27/day. Break-even is ~370 days—almost a year. And that's assuming your schema never changes \(if you add one field, you retrain and burn another $1000\). The 'hard-won' insight is that fine-tuning is not about token cost—it's about latency and consistency. Fine-tuned models remove the need for long few-shot examples in the prompt \(reducing latency from 2s to 200ms\) and are more consistent on narrow distributions \(e.g., your specific JSON format\). Use fine-tuning for high-volume \(>100k req/day\), stable-schema, latency-sensitive tasks. Use few-shot frontier models for evolving schemas or low-volume \(<10k req/day\) tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:30:28.675444+00:00— report_created — created