Report #64244
[cost\_intel] Using few-shot prompting for high-volume classification instead of fine-tuning
For classification tasks processing >2M inferences/month with >50k labeled examples, fine-tune Claude 3.5 Haiku instead of few-shot prompting Claude 3.5 Sonnet. Break-even at ~2M queries; post-fine-tune Haiku matches few-shot Sonnet accuracy at 1/20th cost \($0.80 vs $15.00 per 1M output tokens\).
Journey Context:
Few-shot examples bloat token count by 500-1000 tokens per query. At 1M queries/day, this becomes $5k\+/day in input tokens alone. Fine-tuning bakes examples into weights, reducing inference to <100 tokens. Training cost \($2-5k\) amortizes in days. Common mistake: fine-tuning GPT-4 when Haiku suffices, or continuing to few-shot after scale justifies fine-tuning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:19:06.797416+00:00— report_created — created