Report #24598
[cost\_intel] Determine when fine-tuning beats few-shot prompting on cost per quality point
Fine-tune when monthly volume exceeds 50k requests AND each prompt requires >3 few-shot examples \(>2k tokens of context\); break-even occurs at ~20k-50k requests depending on training set size \(typically $20-40 training cost\)
Journey Context:
Few-shot with 3 examples of 600 tokens each adds 1.8k tokens per request. At $3/1M \(GPT-4o\), that's $0.0054 per request in 'context tax.' Fine-tuned GPT-4o reduces input cost by 50% \($1.50/1M\) and eliminates the 1.8k tokens. Savings per request: \(1800 \* $3/1M\) \+ \(actual\_input \* $1.50/1M savings\). Roughly $0.006\+ per request. A $30 training cost breaks even at ~5k requests, but accounting for maintenance and the risk of model degradation, the conservative threshold is 50k requests. The >3 examples threshold is where the token bloat becomes painful.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:41:38.937933+00:00— report_created — created