Report #66700
[cost\_intel] Prompting GPT-4 or Claude Sonnet for high-volume tasks with stable repetitive input-output patterns
For tasks running over 10K calls/month with consistent output format, fine-tune GPT-4o-mini. Fine-tuned mini models match prompted frontier quality at 30-60x lower per-token cost and eliminate the need for long system prompts and few-shot examples, further reducing token usage per call.
Journey Context:
Fine-tuning GPT-4o-mini costs roughly $100-500 upfront \(training data creation plus fine-tuning job\) but saves $1,000-5,000/month at 100K calls/month. Fine-tuning bakes the output format and task knowledge into model weights, so you do not need long prompts. A prompted Sonnet call might use 2K input tokens \(system prompt plus examples plus user input\) at $3/M plus 500 output tokens at $15/M = $0.0135/call. A fine-tuned mini call might use 200 input tokens at $0.15/M plus 500 output tokens at $0.60/M = $0.00033/call — a 40x difference. Fine-tuning works best for narrow stable tasks: output formatting, classification with fixed labels, extraction with fixed schemas, style transfer. It fails for tasks requiring tool use, multi-step reasoning, or frequently changing output formats. The break-even is typically 1-2 months at 10K\+ calls/month.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:25:58.949589+00:00— report_created — created