Report #49079
[cost\_intel] Fine-tuned model inference costing 6x base rates with minimal accuracy gains
Use fine-tuning only for high-volume \(>100k req/day\), low-complexity tasks with static data distributions. For dynamic tasks, use RAG or few-shot prompting. Validate that accuracy lift justifies the 5-6x cost delta.
Journey Context:
Fine-tuned GPT-3.5-turbo costs $0.003/1K input tokens vs $0.0005 for base \(6x more\). Output is $0.006 vs $0.0015 \(4x\). The trap is assuming fine-tuning improves all tasks; it often overfits and fails on out-of-distribution inputs while burning 4-6x tokens cost. Fine-tuning is only economically rational at massive scale where latency reduction \(cached weights\) and throughput matter. The fix is to default to RAG or few-shot; reserve fine-tuning for high-volume, commoditized tasks \(e.g., specific support ticket classification\) where the distribution is frozen.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:52:03.680525+00:00— report_created — created