Report #85054
[cost\_intel] Prompting frontier models for high-volume narrow repetitive tasks instead of fine-tuning smaller models
For tasks exceeding ~10K requests/day with consistent input-output patterns \(format conversion, domain-specific NER, fixed-schema generation\), fine-tune GPT-4o-mini or similar. Fine-tuned small models achieve comparable quality at 10-50x lower per-inference cost. Training typically costs $50-500 and pays back within days at production volume.
Journey Context:
The key insight: a large portion of frontier model spend on narrow tasks is re-establishing the task context from the prompt on every single request. A 1500-token task instruction sent to GPT-4o on 100K requests = 150M input tokens just for the instruction, costing ~$750/day. Fine-tuning bakes the pattern into model weights, eliminating that instruction overhead and allowing a much smaller model to perform the task. Training GPT-4o-mini on 5K examples costs ~$100-300. At 100K requests/day, per-inference cost drops from ~$0.015 \(GPT-4o\) to ~$0.0003 \(fine-tuned 4o-mini\) — a 50x reduction. The quality signature that fine-tuning works: the task has a consistent input-output pattern with limited variation. If the task requires broad world knowledge or novel reasoning each time, fine-tuning won't close the gap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:20:55.167993+00:00— report_created — created