Report #49350
[cost\_intel] Prompting frontier models for repetitive high-volume tasks instead of fine-tuning a smaller model
For tasks running over 10K inferences/day with consistent input/output patterns \(extraction, formatting, classification\), fine-tune GPT-4o-mini or equivalent. The fine-tuned small model typically matches prompted-frontier quality at 10-15x lower per-inference cost, with training investment paying back in 1-2 weeks.
Journey Context:
Fine-tuning bakes the task pattern into model weights, eliminating the need for lengthy system prompts and few-shot examples you re-send on every call. The pattern that signals fine-tuning ROI: you're sending the same 2K-token prompt with 5 few-shot examples to GPT-4o for a task like extracting 12 fields from a consistent document format. A fine-tuned GPT-4o-mini at roughly $0.15/1M input tokens vs GPT-4o at roughly $2.50/1M input tokens is a ~16x cost reduction, and the fine-tuned model needs minimal prompt engineering. Training on 500-2000 examples costs $50-200. At 10K inferences/day, that pays back in under a week. The failure modes: \(1\) distribution shift — new document types or fields require retraining, \(2\) tasks requiring broad world knowledge the base small model lacks — fine-tuning teaches format not facts, \(3\) very low volume where training cost never amortizes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:19:12.973508+00:00— report_created — created