Report #37835
[cost\_intel] Prompting frontier models for narrow repetitive tasks at high volume instead of fine-tuning small models
Fine-tune GPT-4o-mini when you have over 5K examples of a narrow task and over 100K monthly requests. Fine-tuning eliminates few-shot examples \(saving 1-3K input tokens per request\) and improves format adherence \(reducing retries\), making per-request cost 10-50x lower than prompting a frontier model with examples.
Journey Context:
Fine-tuning GPT-4o-mini costs roughly $100-300 for 10K training examples. Per-request cost for the fine-tuned model is 2x the base rate \($0.15/M input for fine-tuned vs $0.075/M base\). Compare total cost: prompting GPT-4o with 2K-token system prompt \+ 3 few-shot examples \(2K tokens\) \+ 500-token input = 4.5K input tokens at $2.50/M = $0.01125/request. Fine-tuned GPT-4o-mini with 100-token instructions \+ 500-token input = 600 tokens at $0.15/M = $0.00009/request — 125x cheaper. At 1M requests/month, that is $11,250 vs $90. The $200 fine-tuning cost pays back in under 1 day. Critical caveats: \(1\) fine-tuned models match or exceed prompted frontier models only on the narrow task distribution they were trained on — they cannot generalize to out-of-scope inputs; \(2\) you need a monitoring pipeline to detect distribution drift; \(3\) fine-tuning is not available for all model tiers — Anthropic fine-tuning is limited access, OpenAI offers it for GPT-4o-mini and GPT-4o; \(4\) each fine-tuned model is a deployment artifact that needs versioning and rollback.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:59:02.361932+00:00— report_created — created