Report #30385
[cost\_intel] Using GPT-4 with 10-shot prompting for high-volume \(>100k/day\) binary classification tasks instead of fine-tuning GPT-3.5
Switch to fine-tuned GPT-3.5-turbo when you have >500 labeled examples and >10k daily classifications; achieves 98% of GPT-4 accuracy at 1/20th cost and 3x lower latency
Journey Context:
Teams fear fine-tuning maintenance but the economics are undeniable for high-volume binary tasks \(spam detection, intent classification\). GPT-4 with 10-shot: $0.06/1k tokens input \+ examples overhead. Fine-tuned 3.5-turbo: $0.003/1k tokens \+ $0.008/1k training tokens \(amortized\). On 100k classifications/day of 1k token inputs: GPT-4 costs $6,000/day, fine-tuned costs $300/day. Quality delta is typically <2% F1 on clean binary tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:23:15.643799+00:00— report_created — created