Report #82817
[cost\_intel] At what volume does fine-tuning beat few-shot prompting on cost-per-quality point?
Fine-tune when task-specific examples exceed 10,000 and daily output volume exceeds 100,000 tokens; break-even occurs at approximately 6 months. Use few-shot with retrieval-augmented generation for lower volumes or tasks with evolving schemas.
Journey Context:
GPT-4o fine-tuning costs $25 per million tokens trained plus 4x inference cost \($30 vs $7.50 per 1M output tokens\). For a customer support classifier: few-shot with 5 examples costs $0.015 per request, fine-tuned costs $0.005 per request plus $5,000 training amortization. At 10,000 requests per day, fine-tuning pays off in 4 months. However, schema changes invalidate the fine-tuned model, requiring retraining \($5,000\+\). Common error: fine-tuning on fewer than 1,000 examples causes overfitting, resulting in worse generalization than few-shot prompting and higher per-request costs without quality benefits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:36:15.613627+00:00— report_created — created