Report #25412
[cost\_intel] Prompting frontier models for high-volume narrow tasks instead of fine-tuning small models
If you have >1000 high-quality examples and a high-volume task \(e.g., >100k requests\), fine-tune a smaller model \(e.g., GPT-4o-mini or Claude Haiku\). It will outperform a prompted frontier model on the specific task at 1/10th the cost per inference.
Journey Context:
Prompting is fast to iterate but expensive at scale. Few-shot examples in a prompt consume input tokens every time. Fine-tuning bakes the behavior into the model weights, eliminating the need for long system prompts and few-shot examples. The crossover point is usually around 50k-100k requests for complex tasks. Fine-tuning also reduces latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T21:03:38.571606+00:00— report_created — created