Report #87382
[cost\_intel] When does fine-tuning GPT-3.5-Turbo beat GPT-4-Turbo with few-shot prompting for structured data extraction?
Fine-tune when the schema is fixed for >30 days, volume exceeds 1k requests/day, and output requires rigid JSON without semantic variation. Fine-tuned 3.5 costs $0.30 per 1k requests vs $3.00 for GPT-4, with <2% accuracy loss on fixed schemas. Use GPT-4 for evolving schemas or volumes <100 requests/day.
Journey Context:
Fine-tuning fixes output format and task-specific patterns, reducing token count \(no long few-shot examples needed\) and latency. Common error: fine-tuning on diverse schemas; fine-tuning locks you to specific field names and value types. Break-even analysis: at 1k requests/day, upfront $500 fine-tuning cost pays back in 2 weeks vs GPT-4. Critical nuance: fine-tuned models fail on out-of-distribution document layouts that GPT-4 handles zero-shot, creating a hidden quality cliff when suppliers change invoice formats.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:15:34.491556+00:00— report_created — created