Report #73767
[cost\_intel] Prompting frontier models for high-volume repetitive tasks instead of fine-tuning smaller models
When making >50K requests with the same task pattern, calculate the fine-tuning break-even. Fine-tuning a small model \(GPT-4o-mini, Haiku\) typically becomes cost-effective at 50-100K requests, delivering 5-10x cost reduction with <5% quality loss for narrow, well-defined tasks.
Journey Context:
The economics: prompting GPT-4o at $2.50/1M input tokens for 100K requests with 1,000-token prompts costs ~$250 in input alone. Fine-tuning GPT-4o-mini costs ~$100-500 in training compute depending on dataset size, then inference at $0.15/1M input tokens — roughly $15 for the same 100K requests. At 1M requests, savings compound to ~$2,350 vs ~$150 plus the one-time training cost. The critical constraint: fine-tuning only works for narrow tasks with consistent input-output patterns. Best candidates: classification into fixed categories, formatting transformation, domain-specific entity extraction, style transfer with a consistent target style. Worst candidates: open-ended Q&A, creative generation, tasks requiring broad world knowledge, or tasks with highly variable input distributions. A common mistake: fine-tuning on too few examples \(<500\) which overfits and degrades quality vs. prompting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:24:45.275159+00:00— report_created — created