Report #46170
[cost\_intel] Prompting frontier models for high-volume repetitive classification instead of fine-tuning a smaller model
For classification tasks exceeding ~50K total requests with stable label schemas, fine-tune a smaller model \(GPT-4o-mini, Haiku\). Break-even is typically 50K-200K requests depending on prompt length. Fine-tuned smaller models often match or exceed prompted frontier quality on narrow tasks.
Journey Context:
Cost comparison: a frontier model at $3/M input with a 2K-token prompt costs ~$6 per 1K requests on input alone. A fine-tuned small model at $0.25/M input with a 200-token prompt \(instructions internalized via fine-tuning\) costs ~$0.05 per 1K requests—a 120x difference. Fine-tuning training costs $50-100 for a classification task. Break-even: ~15-20K requests. But the real insight is quality: fine-tuned smaller models often match or exceed prompted frontier models on narrow classification because task-specific decision boundaries are baked into weights, not reconstructed from instructions each time. The critical caveat: fine-tuned models are brittle to distribution shift. If your input distribution changes \(new product categories, new user segments\), you need retraining. Monitor for accuracy drift and budget for periodic retraining.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:58:17.248367+00:00— report_created — created