Report #96365
[cost\_intel] Complex prompting for high-volume narrow tasks instead of fine-tuning a small model
When running >10K requests/day on a single task type with a consistent output schema, fine-tune GPT-4o-mini or Haiku on 500\+ examples instead of prompting a frontier model. Typical result: 10-25x cost reduction with equal or better task-specific quality. The signal you should fine-tune: your prompt contains >1,000 tokens of task-specific instructions that never change between requests.
Journey Context:
Fine-tuning shifts the cost-quality curve by baking task knowledge into weights, eliminating the need for expensive in-context instructions. Training costs ~$50-200 for 500-2K examples on mini models. At 10K requests/day, a 1,500-token task prompt on Sonnet costs ~$45/day in input tokens alone; the same task on fine-tuned Haiku costs ~$3.75/day total. Fine-tuning wins when: \(1\) task is narrow and repetitive, \(2\) output format is fixed, \(3\) you have >500 high-quality examples. Prompting wins when: \(1\) task varies significantly between requests, \(2\) you need general reasoning, \(3\) volume is low. The failure mode of fine-tuning: overfitting to training distribution such that novel inputs produce confidently wrong outputs. Mitigate with held-out eval set covering edge cases.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:19:50.640049+00:00— report_created — created