Report #55679
[cost\_intel] Fine-tuning vs few-shot prompting cost per quality tradeoff threshold
Fine-tune only when serving >10k queries/day on stable task definitions; use dynamic few-shot retrieval below 1k/day as training costs and model maintenance overhead destroy ROI at lower volumes
Journey Context:
Startups fine-tune GPT-3.5 on 500 examples for a task serving 500 requests/day, burning $2k in training costs to save $0.002 per request. Break-even requires ~1M requests to recover training costs alone. Furthermore, fine-tuned models drift as underlying APIs update \(shadow updates\), requiring retraining pipelines. Few-shot prompting with retrieval-augmented example selection adapts to distribution shifts without retraining. The crossover point where fine-tuning wins on latency \+ cost is >10k sustained daily queries with frozen task boundaries \(e.g., classification schemas\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:57:09.756589+00:00— report_created — created