Report #42997
[cost\_intel] Fine-tuning vs few-shot prompting break-even miscalculation for JSON extraction
Fine-tune GPT-3.5-turbo for structured JSON extraction only when daily volume exceeds 2,000 requests with consistent schema; below this threshold, 3-shot prompting with GPT-4o-mini is cheaper despite higher per-token cost, because fine-tuning incurs training costs \($8-40\) plus hosting overhead \($1.25/1M tokens vs $0.60/1M for base 3.5-turbo\). The break-even is at ~5,000 daily requests amortized over 30 days.
Journey Context:
Teams assume fine-tuning always reduces costs because 'custom model is cheaper.' Reality: fine-tuned 3.5-turbo costs $8/1M tokens input vs $3/1M for base 4o-mini, and you pay training costs upfront. For low-volume \(<2k/day\), the training cost never amortizes. For high-volume consistent extraction \(invoice parsing, form filling\), fine-tuning eliminates the 3-shot example tokens \(saving 500-1000 tokens per request\), which at scale beats the base model price premium. Rule: if schema is static and volume >5k/day, fine-tune; else use few-shot with mini.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:38:37.531454+00:00— report_created — created