Report #45937
[cost\_intel] Over-reliance on GPT-4o few-shot for high-field-count structured extraction
Fine-tune GPT-4o-mini for extraction tasks with >10 output fields and >500 training examples; achieves 94% accuracy vs 96% for 4o few-shot at 1/8th the cost \($0.60 vs $5.00/1M tokens\), amortizing training cost \($3/1M tokens\) after ~50k inference calls.
Journey Context:
Teams default to 4o with 5-shot prompting for JSON extraction, paying $60/1M output tokens. For stable schemas \(receipts, forms\), fine-tuning 4o-mini on 500 examples achieves comparable F1 scores \(within 2%\) because the task becomes deterministic pattern matching. The break-even is ~50k calls considering training burn. This prevents 'token bleeding' from verbose few-shot examples \(which add 500\+ tokens per call\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:34:47.218270+00:00— report_created — created