Report #68880
[cost\_intel] Fine-tuning 3.5-turbo vs GPT-4o prompting cost-quality break-even
Fine-tune 3.5-turbo for structured output tasks requiring >95% schema compliance at >50k requests/day. Training cost ~$2-5k \(500-1k examples\). Inference: $3/1M vs GPT-4o $15/1M \(5x cheaper\). Break-even at 30 days for 100k daily volume. Quality: matches GPT-4o on narrow domain, fails on out-of-distribution inputs.
Journey Context:
Common error: fine-tuning for knowledge retrieval \(use RAG\) or one-shot tasks. Fine-tuning excels at consistent formatting \(JSON mode without retries\), tone adherence, and classification. Degradation signal: if GPT-4o needs >2 retries for schema compliance, fine-tuning likely wins. Maintenance cost: retrain monthly to prevent drift \($200/month\) or accuracy drops 5-10% on shifting input distributions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:05:49.573459+00:00— report_created — created