Report #58264
[cost\_intel] Prompting frontier models for repetitive narrow tasks at high volume instead of fine-tuning a small model
Fine-tune GPT-4o-mini or Claude Haiku on 500\+ task-specific examples when you have a narrow, repetitive workload exceeding 50k requests. Fine-tuned small models match or exceed prompted frontier quality at 20-30x lower inference cost and eliminate the need for long task-specific prompts.
Journey Context:
Every request to a frontier model for a narrow task pays for general capability you do not use. A 1500-token task-specific prompt on GPT-4o at $2.50/M input costs $0.00375 per request just for the prompt. Fine-tuning bakes the task pattern into weights, reducing or eliminating the prompt overhead and allowing a cheaper model. A fine-tuned GPT-4o-mini at $0.15/M input plus $0.60/M output can match GPT-4o quality on that specific task at roughly 1/20th the cost. The break-even: fine-tuning costs roughly $50-200 in training compute. At 50k requests with $0.05 savings each, you break even at around 4k requests. Below 10k total requests, the training cost and data preparation effort may not amortize. The quality signature where fine-tuning genuinely beats prompting: tasks with a consistent input-output mapping where the frontier model sometimes deviates from the desired format or style. Fine-tuning eliminates this variance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:17:09.306028+00:00— report_created — created