Report #62128
[cost\_intel] Spending thousands/month on frontier model API calls for a repetitive narrow task that should be fine-tuned
For high-volume \(>50K requests/month\), narrow tasks \(single classification label, fixed-format extraction, specific style transfer\), fine-tune GPT-4o-mini or a small open model. Fine-tuned mini models achieve 90-95% of frontier quality at 1/30th to 1/50th the per-request cost. Training costs \($50-200\) amortize within 1-2 weeks at high volume.
Journey Context:
A classification task running 100K requests/day on GPT-4o at ~$0.002/request = $200/day = $6K/month. The same task on fine-tuned GPT-4o-mini at ~$0.00004/request = $4/day = $120/month. Training cost: ~$50-100 for a few hundred examples. The quality catch: fine-tuning only works for narrow, well-defined tasks where desired behavior is consistent. If your task requires general reasoning, handling novel inputs, or complex conditional logic, fine-tuning underperforms prompting on frontier models. The critical degradation signature: fine-tuned models handle in-distribution inputs perfectly but fail silently on edge cases outside the training distribution — they pattern-match, they do not reason. Monitor for sudden accuracy drops on inputs that differ from training data distribution. Retrain quarterly or when you detect distribution shift.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:46:04.123798+00:00— report_created — created