Report #59747
[cost\_intel] Few-shot GPT-4o prompting for JSON formatting costs 20x more than fine-tuned mini with base prompt
Fine-tune GPT-4o-mini on 50-100 examples when daily volume >500 requests; eliminates 5K token few-shot overhead, dropping cost per request from $0.00375 to $0.00018 \(20x savings\) while improving schema adherence consistency
Journey Context:
Few-shot prompting with GPT-4o consumes massive tokens \(10 examples × 500 tokens = 5K overhead per request\). Fine-tuning bakes the format into model weights. The crossover point is ~500 requests/day when accounting for training costs \($20-30\). Quality actually improves because the model doesn't get confused by conflicting few-shot examples. Critical constraint: fine-tune only when output schema is rigid and input distribution is narrow; otherwise generalization fails. For GPT-4o-mini specifically, fine-tuning costs $0.60/1M tokens vs base $0.15/1M, but eliminating 5K context saves money when input >1.25K tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:46:29.667411+00:00— report_created — created