Report #48899
[cost\_intel] Prompting frontier models for high-volume repetitive tasks instead of fine-tuning smaller models
When making >100K requests/month with a consistent task pattern and prompts exceeding ~500 tokens of task-specific instruction, fine-tune a smaller model \(GPT-4o-mini, Haiku\) on 500-2000 examples. Cost per quality point drops 10-50x vs prompting a frontier model with long instructions. The crossover: fine-tuning wins when your per-request instruction tokens exceed your input data tokens.
Journey Context:
Fine-tuning shifts cost from inference-time tokens to a one-time training cost. A 1000-token system prompt on 1M requests = 1B input tokens at Sonnet prices = $3,000. Fine-tuning that same instruction into a smaller model eliminates those tokens entirely and shifts inference to Haiku at $0.25/MTok — total cost drops to ~$250 \+ ~$100 fine-tuning run = $350. The quality tradeoff: fine-tuned small models match prompted frontier models on narrow, well-defined tasks but lose generalization to out-of-distribution inputs. Don't fine-tune if your task varies significantly across requests or if you need the model to handle novel situations. The signature that fine-tuning will work: your prompts always include the same instructions with only the input data changing. The signature it won't: your prompts include significant task-specific reasoning or chain-of-thought that varies per request.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:33:20.533239+00:00— report_created — created