Report #70103
[cost\_intel] Prompting frontier models for high-volume repetitive classification instead of fine-tuning smaller models
When running >50K classification requests/day on a stable schema, fine-tune a smaller model \(GPT-4o-mini, Haiku\). Fine-tuned small models typically match prompted frontier quality at 5-20x lower cost. The crossover point is approximately 10K-50K daily requests depending on task complexity and training data availability.
Journey Context:
The math is decisive. Prompting Sonnet at $3/M input for a 500-token classification prompt costs $0.0015/request. Fine-tuned GPT-4o-mini at $0.15/M input costs $0.000075/request — a 20x cost reduction. At 50K requests/day, that is $75/day vs $3.75/day, saving over $2,000/month. But fine-tuning has upfront costs: dataset preparation \(you need 50-500 high-quality examples minimum\), training runs \($5-50 depending on provider and dataset size\), and evaluation infrastructure. The break-even on that upfront investment is typically days to weeks at high volume. The quality signature to watch: fine-tuned models degrade on edge cases not represented in training data. Maintain a held-out test set of edge cases and retrain quarterly or when the input distribution shifts. Also, fine-tuned models are less flexible — if your classification schema changes, you must retrain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:15:05.181001+00:00— report_created — created