Report #46873
[cost\_intel] Prompting large models for repetitive classification at high volume instead of fine-tuning a small model
Fine-tune a small model \(GPT-4o-mini, Haiku\) for classification tasks when you have >5K labeled examples and expect >50K total requests. Fine-tuned small models often match or exceed prompted frontier model quality at 1/20th the per-request cost because they internalize the decision boundary directly into weights.
Journey Context:
Prompting a frontier model for classification is paying for general intelligence to do a narrow task. Fine-tuning compresses task-specific knowledge into model weights, eliminating the need for long classification instructions and few-shot examples. The breakeven math: fine-tuning GPT-4o-mini costs ~$100-300 for 5K examples; the fine-tuned model costs $3.75/M input tokens vs $2.50/M for base GPT-4o-mini but needs a much shorter prompt \(~50 tokens vs ~500 tokens\). Compared to prompted GPT-4o at $2.50/M input with 500-token prompts, the fine-tuned mini model at $3.75/M with 50-token prompts is ~8x cheaper per request. At 50K requests, you have recouped the fine-tuning cost. Caveat: fine-tuned models are less flexible — if your classification schema changes weekly, stick with prompting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:09:05.372593+00:00— report_created — created