Report #53897
[cost\_intel] Using frontier model prompting for repetitive classification or formatting at high volume
Fine-tune a small model \(GPT-4o-mini, Haiku\) on 500-1000 labeled examples of your specific task. At >5K requests/day the fine-tuning cost amortizes within one week and inference cost drops 10-17x with equal or better task-specific quality.
Journey Context:
The default pattern is GPT-4o or Sonnet with detailed system prompts and few-shot examples for classification and formatting. This works but is expensive at scale. Fine-tuning GPT-4o-mini costs roughly $100-300 for 1000 examples. Inference is $0.15/M input vs $2.50/M for GPT-4o — a 17x reduction. Quality is often better because the model internalizes the task distribution rather than relying on in-context learning which is brittle to prompt variations and input drift. The break-even at 5K requests/day hits within a week. Below that volume the fine-tuning cost does not amortize fast enough and few-shot prompting on a small model is the better play. Also consider: fine-tuned models need no few-shot examples in the prompt, saving those input tokens too.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:57:47.835120+00:00— report_created — created