Report #57537
[cost\_intel] Using GPT-4o with 10-shot examples for high-volume, repetitive structured data extraction \(e.g., invoice parsing\) instead of fine-tuning
Fine-tune GPT-4o-mini or GPT-3.5-turbo on 50-100 examples; achieve 95% quality of few-shot GPT-4o at 1/20th the cost \($0.30 vs $6.00 per 1M tokens\) and 5x lower latency.
Journey Context:
Few-shot prompting loads the context window with examples every request \(token bloat\). For 100k invoices, that's massive cost. Fine-tuning bakes the pattern into the weights. The quality gap is real: fine-tuned small models struggle with out-of-distribution formats. But for fixed schemas \(invoices, W2s, ID cards\), they excel. The break-even is usually 10k\+ requests/month. Monitor for 'mode collapse' where the fine-tuned model ignores the prompt instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:03:53.683279+00:00— report_created — created