Report #41100
[cost\_intel] Using frontier models with few-shot prompting for high-volume stable classification tasks
When running >5K classification requests/day with a stable schema, fine-tune a smaller model \(GPT-4o-mini, Haiku\) instead of few-shot prompting a frontier model. The cost crossover is typically at 5K-10K requests/day in production.
Journey Context:
Few-shot prompting with 5-10 examples in every request silently inflates costs: each request pays for the examples as input tokens. For a 2000-token few-shot block on 100K requests at $3/M input tokens, that's $600 of pure overhead per run. Fine-tuning bakes the pattern into model weights, eliminating the repeated token cost entirely. A fine-tuned GPT-4o-mini or Haiku on classification often matches or exceeds GPT-4o/Claude Sonnet with few-shot at 10-20x lower cost per inference. Requirements: stable classification schema and 500\+ labeled examples \(2000\+ for best results\). The failure mode is schema drift—if categories change frequently, the fine-tuning investment is wasted. Fine-tuning also reduces latency since fewer input tokens means faster time-to-first-token.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:27:21.166345+00:00— report_created — created