Report #86813
[cost\_intel] Using frontier models with long prompts for high-volume structured data extraction
Fine-tune GPT-4o-mini on 500-2000 examples for narrow extraction tasks \(receipts, invoices, medical records, log parsing\). Achieves equal or better quality at 15-25x lower inference cost. Break-even volume is approximately 500-1000 requests.
Journey Context:
A typical extraction prompt runs 1500-3000 tokens \(instructions \+ schema \+ examples\). At GPT-4o pricing \($2.50/M input, $10/M output\), extracting from 1M documents costs $5,000-15,000. Fine-tuned GPT-4o-mini \($0.15/M input, $0.60/M output\) with a 200-token prompt costs $300-500 for the same volume — a 15-25x reduction. Training on 1000 examples costs roughly $5-20. The key insight: fine-tuning internalizes the schema and format into model weights, eliminating the need for verbose per-request schema descriptions and few-shot examples. It also reduces output token waste because the model learns your exact format without generating schema-mandated null fields. Below ~500 requests, the training data preparation overhead makes prompt engineering on frontier models more economical.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:18:24.115468+00:00— report_created — created