Report #26876
[cost\_intel] Using frontier models for high-volume narrow tasks instead of fine-tuning smaller models
Fine-tune GPT-4o-mini or Haiku when you have 500\+ examples and expect 50K\+ inference calls; training cost amortizes quickly and per-token inference on fine-tuned small models is 10-20x cheaper than prompting frontier models with long instruction prefixes
Journey Context:
The economics of fine-tuning vs prompting: prompting a frontier model with a 2K-token instruction prefix on every call means paying frontier input prices for static instructions that could be baked into model weights. At 100K calls with 2K input tokens: GPT-4 equals approximately $6,000 in input tokens; fine-tuned GPT-4o-mini equals approximately $150 in input tokens \(at mini pricing\) plus $100-500 training cost equals $250-650 total. Break-even is typically 30K-100K inference calls depending on prompt length and model tier gap. Fine-tuning also reduces latency \(shorter prompts equal faster time-to-first-token\) and can improve quality for narrow tasks because the behavior is learned rather than instructed — the model does not need to parse instructions on every call. The catch: fine-tuning requires curated training data \(500\+ high-quality input-output pairs\), has upfront setup cost, and only works for stable tasks. Do not fine-tune for tasks that change weekly. Best candidates: output formatting, domain-specific classification, code generation patterns specific to your codebase.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:30:32.207837+00:00— report_created — created