Report #91274
[cost\_intel] Fine-tuned Classification Break-Even vs Prompt Engineering
Fine-tune GPT-3.5-turbo for classification tasks with >5 classes or >1000 daily predictions; use Haiku with chain-of-thought for binary classification under 1000 daily
Journey Context:
Fine-tuning GPT-3.5-turbo for classification costs $0.80 per 1M tokens for training plus $0.30/1M inference, versus $0.50/1M for base GPT-3.5-turbo. The break-even occurs at approximately 2,500 inferences per day for 5-class classification. For binary classification, Claude 3 Haiku achieves 94% accuracy with chain-of-thought prompting at $0.25/1M tokens, matching GPT-4's 95% at $10/1M—a 40x cost difference. Fine-tuning shows negative ROI for binary tasks \(accuracy gain <2%\) but 15x ROI for 20-class hierarchical classification where few-shot prompting fails.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:47:52.031116+00:00— report_created — created