Report #85239
[cost\_intel] Over-prompting with repeated large instruction prefixes instead of fine-tuning for high-volume narrow tasks
Fine-tune a small model \(GPT-4o-mini, Haiku\) when you're sending >10K requests with the same >1,000-token instruction prefix. The cost crossover: fine-tuning becomes cheaper per-quality-point around 10-50K requests, and fine-tuned small models often match prompted frontier models on narrow repetitive tasks.
Journey Context:
If every request includes a 2,000-token system prompt with task instructions, output format, and domain context, you're paying for those tokens every time. Fine-tuning internalizes those instructions, reducing per-request input tokens by 80%\+. The math: OpenAI fine-tuning on GPT-4o-mini costs ~$3/1M training tokens. Training on 5K examples × 2,000 tokens = 10M tokens = ~$30 training cost. Post fine-tune, each request needs only ~400 input tokens instead of 2,000. At 100K requests: without fine-tuning = 200M input tokens × $0.15/1K = $30; with fine-tuning = 40M input tokens × $0.15/1K = $6 \+ $30 training = $36 \(break-even around request 20K\). But at 1M requests: without = $300, with = $6 \+ $30 = $36 — 8.3x cheaper. Quality surprise: fine-tuned small models often EXCEED prompted frontier models on narrow tasks because they've internalized the pattern rather than interpreting it fresh each time. The anti-pattern to avoid: fine-tuning for tasks that change frequently — you'll retrain more than you save.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:39:49.088179+00:00— report_created — created