Report #78685
[cost\_intel] Using frontier model prompting for high-volume narrow tasks like structured data extraction
When processing over 5000 examples/month of a narrow, consistent task \(resume parsing, receipt extraction, form filling, log categorization\), fine-tune a smaller model instead of prompting a frontier model. The cost crossover is typically 2000-5000 calls depending on prompt complexity and model pricing. Use the same JSON schema for training examples and inference.
Journey Context:
A frontier model \(GPT-4o at $2.50/M input \+ $10/M output\) with a 1000-token system prompt extracting structured data from 500-token inputs costs ~$0.0035 per call. Fine-tuned GPT-4o-mini \($0.15/M input \+ $0.60/M output\) with a 100-token prompt \(the pattern is baked into the model\) costs ~$0.00017 per call — a 20x reduction. Fine-tuning costs ~$10-50 for 500-2000 training examples on OpenAI. Breakeven: ~3000-5000 calls. After that, savings compound. The key diagnostic: fine-tuning works when the task is NARROW \(consistent input/output schema\) and HIGH-VOLUME. It fails when the task requires broad reasoning outside the training distribution. The signature of a good fine-tuning candidate: you're using the same 1000\+ token system prompt for every call and it always requests the same JSON schema — that prompt is effectively a manual fine-tuning that you're paying to re-apply on every request.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:40:04.937998+00:00— report_created — created