Report #53298
[cost\_intel] When does fine-tuning Claude 3 Haiku beat few-shot GPT-4o for JSON extraction on cost-quality frontier?
Fine-tune Haiku when schema has >5 nested fields and volume exceeds 100k requests/day; below this scale, 4o with 3-shot examples dominates on latency and avoids $3k\+ tuning sunk cost.
Journey Context:
Teams assume fine-tuning always reduces per-token cost, but Haiku fine-tuning requires 50\+ examples and API costs that amortize only at scale. The quality gap: Haiku fine-tuned matches 4o on strict schema adherence but hallucinates 8% more on optional fields. For <100k daily, 4o's zero-shot JSON mode with retry loops is cheaper than maintaining a tuned model. Only tune when schema stability is guaranteed for 6\+ months.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:57:29.589984+00:00— report_created — created