Report #45773
[cost\_intel] Fine-tuning GPT-4o-mini cost break-even vs few-shot prompting for structured extraction
Fine-tune GPT-4o-mini when you have >10k examples and require <50ms latency with 90%\+ accuracy on domain-specific schema; training costs \($2.40/1M tokens\) pay back vs few-shot GPT-4o after ~50k inference calls by eliminating example token overhead.
Journey Context:
People default to few-shot prompting with frontier models, paying per-request for the full examples. Fine-tuning bakes the patterns into the model weights, allowing zero-shot inference with no context window usage. The cliff is when your task changes frequently—retraining costs dominate. The quality signature of under-finetuning is inconsistency on edge cases that examples would have covered; over-finetuning is catastrophic forgetting or rigidness to schema variations. The break-even calculation must include the reduced latency \(no need to process 5k tokens of examples\) which improves user experience beyond just token cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:18:20.397287+00:00— report_created — created