Report #68363
[cost\_intel] Prompting frontier models with extensive few-shot examples for high-volume narrow tasks instead of fine-tuning
Fine-tune a small model \(GPT-4o-mini, Haiku\) when you have 500\+ training examples and over 10K daily inferences on a narrow repetitive task; cost-per-inference drops 5-10x with equal or better quality
Journey Context:
A 500-token few-shot prompt on GPT-4o costs $2.50/M input plus $10/M output. Fine-tuned GPT-4o-mini costs $0.15/M input plus $0.60/M output with a 50-token prompt. At 100K calls/day that is roughly $375/day versus $15/day, a 25x savings. Fine-tuning training costs $100-300 one-time. Break-even is under 1 day. Quality surprise: fine-tuned small models often exceed prompted frontier models on narrow tasks because they internalize the pattern into weights rather than relying on fragile in-context learning. The failure mode of few-shot prompting at scale: examples that are similar but not identical to the query can mislead the model, while fine-tuning learns the underlying transformation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:14:04.440133+00:00— report_created — created