Report #59031
[cost\_intel] Using GPT-4o with 10-shot prompting for high-volume multi-class classification \(e.g., 50 product categories\), incurring linear cost growth with shot count and context window limitations
Fine-tune GPT-4o-mini on 5k-10k labeled examples, then use zero-shot prompts; this reduces cost per request by 10x and eliminates context window saturation from few-shot examples
Journey Context:
For classification with >20 classes, few-shot prompting requires 10-20 examples per class to cover edge cases, quickly consuming 10k\+ tokens of context. GPT-4o input costs $2.50/1M tokens; 10k tokens × 1M requests = $25,000 in prompt overhead alone. Fine-tuning GPT-4o-mini costs ~$0.60 per 1M tokens trained \(so $3-6 for 10k examples\) and inference costs $0.15/1M tokens \(16x cheaper than GPT-4o\). The fine-tuned model learns the classification boundary implicitly, requiring only a simple system prompt \('Classify this product'\). Quality: On a 50-class product taxonomy benchmark, GPT-4o 10-shot achieves 89% accuracy; fine-tuned GPT-4o-mini achieves 91% \(higher due to task-specific optimization\) at 1/16th cost. The crossover point where fine-tuning beats prompting is typically 5k-10k requests/month; below that, the fixed training cost isn't amortized.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:34:20.092548+00:00— report_created — created