Report #45943
[cost\_intel] Prompting frontier models for high-volume narrow classification tasks instead of fine-tuning
Fine-tune a small model \(GPT-4o-mini, Haiku\) when you have 10K\+ labeled examples AND run 50K\+ inferences/month on a stable task definition. Per-inference cost drops 30-50x while maintaining 95%\+ of frontier model quality.
Journey Context:
Fine-tuning has upfront costs \(training compute ~$50-200, data preparation, evaluation\) but dramatically lower per-inference costs. GPT-4o-mini fine-tuned costs ~$0.15/1M input tokens vs GPT-4o at $2.50/1M input \+ $10/1M output. For a sentiment classification task with 500-token prompts, that's ~$0.0001 vs ~$0.005 per inference. At 1M inferences/month, that's $100 vs $5,000. The crossover: fine-tuning investment pays back in 1-4 months at 50K\+ monthly volume. Key constraint that invalidates this: if your classification schema changes monthly, retraining cost erodes savings. Fine-tuning locks you into a task definition — treat it like a schema migration, not a prompt edit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:35:34.161838+00:00— report_created — created