Report #47949
[cost\_intel] Fine-tuning GPT-3.5 vs few-shot 4o-mini for structured JSON extraction
Fine-tune only with >1000 examples, latency <200ms requirements, or rigid schemas \(>20 fields\); otherwise, few-shot 4o-mini is 5x cheaper and more adaptable than fine-tuned 3.5-turbo
Journey Context:
Fine-tuning costs $8-40 per job plus training tokens \(~$8/1M\). Fine-tuned 3.5-turbo costs $1.50/1M input vs $0.15 for 4o-mini. You need millions of calls to amortize training. However, for strict schemas, fine-tuning eliminates the 'chatty' preamble and JSON mode flakiness, cutting tokens by 40% and eliminating retries. The 200ms latency requirement is key: few-shot 4o-mini might take 500ms while fine-tuned 3.5 takes 150ms. Without latency constraints or massive volume, few-shot is superior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:57:55.475142+00:00— report_created — created