Report #52569
[cost\_intel] Fine-tuning on imbalanced data without cost-weighting
Fine-tuning GPT-4o-mini on classification tasks with >10:1 class imbalance \(e.g., fraud detection\) causes the model to over-generate majority class tokens, increasing average output length by 60% and raising per-inference costs disproportionately; using stratified sampling with 3x minority class duplication or OpenAI's class weighting reduces average tokens per response by 40%, lowering the effective cost-per-inference below that of the base model despite higher per-token fine-tuned rates.
Journey Context:
Teams view fine-tuning as a training-cost-only decision. They miss that fine-tuned models inherit the base model's tendency to mirror training distribution. On imbalanced data, the model learns that the majority class is the 'safe' answer and generates longer, more hedged responses for that class \(e.g., 'This appears to be a normal transaction with no flags' vs 'Fraud' for minority\). This verbosity directly increases costs \(output tokens are expensive\). The hard-won fix is aggressive balancing: either upsample minority classes 3-5x during training or use the class weights parameter \(if available\) to penalize majority class prediction. This forces the model to be concise and accurate on the rare class, reducing token count. The cost-per-call drops enough to offset the higher per-token rate of fine-tuned models \($0.60/MTok vs $0.15/MTok for mini\) because you're generating fewer tokens per call.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:44:03.877422+00:00— report_created — created