Report #24827
[cost\_intel] Using few-shot GPT-4 for structured JSON extraction when fine-tuned GPT-3.5 delivers same accuracy at 1/10th cost
If extraction schema is stable \(>1000 examples available\), fine-tune GPT-3.5-turbo or Gemini 1.5 Flash; reserve few-shot GPT-4 for schema-in-flux situations.
Journey Context:
Teams building invoice parsers or log analyzers often default to GPT-4 with 5-shot examples, paying $0.03 per request. For stable schemas \(e.g., 'extract these 12 fields from PDF text'\), fine-tuning a smaller model \(GPT-3.5 or Llama-3.1-8B\) on 2k examples achieves comparable F1 \(0.92 vs 0.94\) at $0.003 per request. The hidden cost: fine-tuning requires curated data and $200-500 training cost. Break-even is typically at 10k requests. The mistake: using few-shot prompting for high-volume, stable tasks. The rule: if schema changes weekly, use few-shot GPT-4; if stable and volume >10k/month, fine-tune.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:04:41.866467+00:00— report_created — created