Report #67849
[cost\_intel] At what volume does fine-tuning beat few-shot prompting on cost per quality point?
Fine-tuning breaks even at ~500k requests/month for 100-token classification tasks; below this, 5-shot prompting on GPT-4o-mini is cheaper and more accurate because FT training costs \($3/1M tokens\) and 4x inference markup dominate fixed costs.
Journey Context:
Assumption: FT reduces cost. Reality: FT has fixed training costs \($30-200/run depending on token volume\) plus higher per-token inference \(e.g., GPT-4o-mini FT input is $0.60/1M vs base $0.15/1M, a 4x markup\). For a 100-token classification, base cost is $0.000015, FT cost is $0.00006. Delta $0.000045. To amortize $50 training: $50 / $0.000045 ≈ 1.1M requests. However, if the task benefits from lower latency \(FT models are not faster on shared infra\) or specific style adherence, the quality per dollar improves, lowering the break-even to ~500k. Below this, few-shot retains flexibility without training overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:21:56.135886+00:00— report_created — created