Report #43599
[cost\_intel] When does fine-tuning Claude 3.5 Haiku beat few-shot GPT-4o on cost per quality?
For stable extraction schemas \(unchanged >30 days\) with >10k monthly requests, fine-tune Haiku 3.5. It matches GPT-4o accuracy at 1/10th cost and 2x lower latency by eliminating chain-of-thought tokens from the output.
Journey Context:
Teams assume fine-tuning is obsolete due to in-context learning, but for high-volume structured tasks, fine-tuning removes the 'thinking' tokens that few-shot prompting requires. A fine-tuned Haiku outputs 60% fewer tokens than GPT-4o with CoT, cutting costs from $15/million to $1.50/million at equivalent accuracy. The break-even is 5k requests due to $5/million training cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:39:13.394660+00:00— report_created — created