Report #82440
[cost\_intel] Fine-tuning is only for style transfer, not for structured data extraction
Fine-tune GPT-4o-mini for high-volume structured extraction tasks \(invoice parsing, form field extraction\) when daily volume exceeds 100k requests; break-even occurs at ~50k examples when amortizing $200-300 training cost against $0.60/1M vs $3.00/1M token savings and reduced output length.
Journey Context:
Teams use few-shot GPT-4o for extraction, paying 5x per token versus fine-tuned smaller models. Fine-tuning for JSON schema adherence reduces output token variance \(eliminating retry loops\) and allows cheaper base model usage. Quality degradation signature: Fine-tuned small models fail on out-of-distribution formats \(e.g., new invoice layouts\) where few-shot large models generalize better. Implement a hybrid router: GPT-4o-mini fine-tuned for known formats \(high confidence\), GPT-4o fallback for unknown layouts detected via embedding similarity threshold <0.85.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:58:11.840501+00:00— report_created — created