Report #47997
[cost\_intel] Using few-shot GPT-4 for high-volume entity extraction instead of fine-tuning smaller models
Fine-tune GPT-3.5-turbo or Claude Haiku when processing >10k similar documents/month with stable schemas
Journey Context:
OpenAI's fine-tuning case studies demonstrate that a fine-tuned GPT-3.5-turbo matches GPT-4 few-shot accuracy on narrow extraction tasks \(e.g., invoice parsing, resume entity extraction\) at 1/20th the inference cost. Break-even occurs at approximately 5,000 requests/day given the $200-500 fine-tuning job cost. Without fine-tuning: GPT-4 costs $30/M output tokens vs fine-tuned 3.5 at $1.50/M. Critical constraint: the extraction schema must be stable; if field definitions change weekly, fine-tuning retraining costs dominate. Also, fine-tuned models lose the 'reasoning' edge for ambiguous cases—use hybrid approach: fine-tuned model for extraction, frontier model for confidence checks on low-probability extractions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:02:51.960334+00:00— report_created — created