Report #52377
[cost\_intel] Using expensive frontier models with complex chain-of-thought prompts for structured extraction tasks that repeat thousands of times daily
Fine-tune GPT-4o-mini or Claude 3.5 Haiku for stable structured extraction tasks \(consistent schema, high volume >10k/day\). Cost drops from $3.00/1M output tokens \(Sonnet\) or $0.60 \(GPT-4o-mini base\) to $0.15/1M tokens \+ ~$20 training cost. Quality often improves over prompting because the model learns the specific noise patterns of your input distribution \(e.g., PDF OCR artifacts\).
Journey Context:
People assume fine-tuning is for 'style' or complex behavior. Actually, the biggest ROI is boring data extraction at scale. Frontier models are overkill for mapping invoice PDFs to JSON. Fine-tuning a small model on 500-1000 examples of your specific format beats few-shot prompting on large models because: 1\) Token costs are 10-20x lower, 2\) Latency is better, 3\) You don't pay for the 'reasoning' tokens the big model uses to understand the schema each time. Critical caveat: only works if schema is stable. If fields change weekly, fine-tuning becomes a maintenance nightmare.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:24:25.439778+00:00— report_created — created