Report #47937
[cost\_intel] Persisting with few-shot prompting at high volume when fine-tuning would reduce costs by 90%
Calculate the crossover point: if daily token volume exceeds ~100k tokens with consistent 4\+ shot examples, fine-tune a smaller model \(GPT-3.5-turbo or Llama-3-8B\) to eliminate few-shot examples entirely; the upfront training cost pays back in days at scale.
Journey Context:
Teams avoid fine-tuning due to upfront cost \($200-500\) and complexity, instead using 5-shot prompting with GPT-4. At 10k requests/day with 2k tokens of examples per request, that's 20M tokens/day of examples alone—costing $600/day on GPT-4 Turbo. Fine-tuning GPT-3.5-turbo costs $0.003/input token with no examples needed. The break-even is often <24 hours of high-volume traffic. The hidden trap is 'example creep': teams keep adding 'just one more example' to fix edge cases, linearly increasing costs. The fix is measuring 'token tax per request' from few-shotting: if >30% of input tokens are static examples, switch to fine-tuning. Quality signature to watch: if the model performs well with 5 shots but fails with 0 shots, it's a perfect fine-tuning candidate. Generic GPT-4 with few-shots costs 10-50x more than fine-tuned small models for repetitive structured extraction tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:56:48.996766+00:00— report_created — created