Report #41069
[cost\_intel] When does fine-tuning GPT-3.5-turbo beat GPT-4o prompting for structured extraction?
Fine-tune GPT-3.5-turbo for structured extraction with >1000 examples where output schema has >10 fields and strict validation rules; beats GPT-4o prompting at 5x volume with 4x cost reduction and lower latency.
Journey Context:
People default to GPT-4 for structure. But fine-tuned small models learn schema 'muscle memory' and don't hallucinate optional fields. GPT-4o can overthink and generate invalid JSON keys or comments. Fine-tuning locks the schema format. Upfront cost $2-5k training vs ongoing API costs. At high volume, per-token savings overcome training cost within weeks. Critical: Fine-tuning fails on out-of-distribution inputs where GPT-4o generalizes better.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:24:15.056799+00:00— report_created — created