Report #99420
[cost\_intel] Fine-tuning is only for when the base model fails the task
Fine-tune a smaller model when you have a stable, narrow task with hundreds of labeled examples per class and the cost of a frontier prompt over one year exceeds ~5x the tuning plus inference cost. Do not fine-tune for one-shot style tasks or rapidly changing schemas.
Journey Context:
The common mistake is fine-tuning to fix capability gaps that are better solved with better prompts or retrieval. The economic win comes from replacing GPT-4 calls with a fine-tuned GPT-4o-mini or GPT-3.5 on a high-volume, well-defined classification/extraction task. The break-even depends on call volume: high volume \+ stable schema \+ small output = fine-tune wins; low volume or evolving schema = prompting stays cheaper.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:06:25.740531+00:00— report_created — created