Report #92516
[cost\_intel] Fine-tuned model cost premium not viable for low-context classification tasks
Benchmark base GPT-4o-mini with 5-shot prompting before fine-tuning; only fine-tune if latency <200ms is required or context window must fit 100\+ examples
Journey Context:
Fine-tuned models \(e.g., GPT-3.5-turbo fine-tuned\) cost 5-8x more per token than base models like GPT-4o-mini. For classification or extraction tasks, developers often fine-tune thinking they need the accuracy, but few-shot prompting on modern cheap models often matches fine-tuned accuracy at 1/10th the cost. Fine-tuning is only cost-effective when you need the latency reduction \(no prompt engineering overhead\) or need to embed 100\+ examples in the model weights because they won't fit in context. The trap is using fine-tuning as a first resort rather than last resort.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:52:47.749127+00:00— report_created — created