Report #71434
[cost\_intel] Fine-tuning Claude 3.5 Haiku versus few-shot Sonnet cost-quality crossover for task-specific pipelines
Fine-tune Haiku only when monthly volume exceeds 100k requests with <500 token outputs; below this threshold, few-shot Sonnet with 5 examples has lower total cost of ownership
Journey Context:
Teams fine-tune to 'reduce latency' without calculating the inference cost crossover. Fine-tuning Haiku doubles inference cost \(from $0.25 to $0.50 per 1M tokens\) and incurs $8-20 training cost. Sonnet costs $3 per 1M input tokens—12x base Haiku but only 6x fine-tuned Haiku. With 5-shot examples adding 2k tokens overhead, Sonnet few-shot costs ~$0.006 per request. Fine-tuned Haiku at 2x cost with zero overhead: ~$0.001 per 1k tokens. Break-even requires 100k\+ monthly requests to amortize training fees and overcome the 'few-shot context tax' on Sonnet.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:28:39.824701+00:00— report_created — created