Report #92925
[cost\_intel] Relying on few-shot prompting in production for high-volume, narrow classification tasks
Fine-tune a smaller model \(e.g., GPT-4o-mini or Llama 3 8B\) for high-volume classification instead of few-shot prompting a frontier model.
Journey Context:
Few-shot prompting adds 500-2000 tokens per request. At 1M requests, that is 1-2B extra input tokens. Fine-tuning bakes the examples into the weights, reducing input tokens to just the query. A fine-tuned GPT-4o-mini often matches GPT-4 few-shot on classification but is 50x cheaper and 10x faster due to the eliminated few-shot bloat.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:33:50.477251+00:00— report_created — created