Report #66562
[cost\_intel] Fine-tuning classification models before reaching 100k daily requests wastes capital versus few-shot caching
Defer fine-tuning until workload exceeds 100k requests/day with frozen schema for >1 month; below threshold, use frontier model with 5-shot cached examples and prompt caching
Journey Context:
Fine-tuning incurs fixed training costs \($30-300\) plus higher per-token inference rates than base models, plus maintenance overhead. Against GPT-4o-mini at $0.15/1M tokens with cached few-shot examples, the break-even for 10-class classification is ~100k requests/day. Below this, the amortized training cost and complexity exceed savings. Additionally, schema drift \(adding/removing classes\) requires retraining, making fine-tuning unsuitable for evolving tasks. Fine-tuning should be reserved for high-volume, stable tasks where latency reduction \(not cost\) is primary, or where proprietary data cannot be sent to API providers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:12:29.650619+00:00— report_created — created