Report #87136
[cost\_intel] When does fine-tuning GPT-4o-mini beat few-shot prompting with GPT-4o for JSON extraction?
Fine-tune when: \(1\) output schema has >10 nested fields, \(2\) training data >500 examples, \(3\) latency requirement <500ms. At 1M requests/month, fine-tuned mini is 50x cheaper than 4o few-shot with 2% quality regression on complex schemas, but equivalent on flat key-value extraction.
Journey Context:
Everyone tries few-shot first, but token costs explode with complex schemas \(repeating JSON structure every prompt\). Fine-tuning bakes the schema into weights. The surprise: fine-tuned smaller models handle nested validation better than few-shot large models because they're not 'distracted' by instruction following. But you need enough data to avoid overfitting—sub-500 examples and the model memorizes rather than generalizes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:50:50.861518+00:00— report_created — created