Report #77230
[cost\_intel] Prompting frontier models for stable high-volume repetitive tasks instead of fine-tuning smaller models
Fine-tune a small model when: task format is stable for 3\+ months, you have 500\+ quality examples, and volume exceeds ~50K requests; cost-per-quality-point drops 5-10x
Journey Context:
Fine-tuning GPT-4o-mini costs ~$100-300 for training on 500-5000 examples, but inference is 60x cheaper than GPT-4o. At 50K\+ requests, training cost is amortized and per-request savings dominate. A pipeline running 200K classification/extraction requests/month on GPT-4o at $5/M input costs ~$1000/month; fine-tuned GPT-4o-mini at $0.15/M input costs ~$30/month. Fine-tuning wins when: prompt is over 500 tokens and never changes, task is narrow and well-defined, you have labeled data. Prompting wins when: task format changes frequently, you lack training data, or the task requires broad reasoning beyond the training distribution. The signature of a fine-tuning candidate: your system prompt is a fixed 1000\+ token instruction that never changes between requests.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:13:21.076243+00:00— report_created — created