Report #95394
[cost\_intel] When does fine-tuning GPT-4o-mini beat few-shot prompting on cost per request
Fine-tune when daily request volume exceeds 10k and prompt length exceeds 500 tokens; expect 40% cost reduction and 2x lower latency vs few-shot
Journey Context:
Few-shot examples bloat token count linearly with example count \(n×example\_tokens\). Fine-tuning bakes the examples into weights, reducing inference to base prompt only. Crossover math: At 10k requests/day with 1k tokens of few-shot context, inference cost of few-shot exceeds training amortization within 1 month. Additionally, fine-tuned models have lower latency \(no context processing\) and higher consistency \(no example selection variance\). Critical caveat: Fine-tuning requires 50\+ examples and maintenance overhead; only viable for stable task definitions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:41:53.706441+00:00— report_created — created