Report #55198
[cost\_intel] Prompting large models with few-shot examples for repetitive classification at scale
Fine-tune GPT-4o-mini or Haiku for any classification task exceeding ~10K requests. Fine-tuning eliminates the need for system prompt instructions and few-shot examples, reducing input tokens by 90%\+ and per-request cost by 10-20x. Training cost \(~$50-100\) pays back within 10K-50K inferences depending on original prompt size.
Journey Context:
A classification prompt typically includes: 500-token system prompt \+ 5 few-shot examples at 300 tokens each = 2000\+ tokens of overhead per request. At GPT-4o rates \($2.50/Mtok input\), that's $0.005/request just for prompt overhead. A fine-tuned GPT-4o-mini needs only the raw input \(~100 tokens\) with no examples. At $0.15/Mtok input, that's $0.000015 — a 333x reduction in input cost. Break-even: $100 training cost / \($0.005 - $0.0003\) per request ≈ 21K requests. The non-obvious benefit: fine-tuned models are more consistent on edge cases that few-shot examples don't cover, because the behavior is baked into weights rather than dependent on in-context pattern matching. The trap: teams keep adding few-shot examples to fix edge cases, bloating the prompt to 5000\+ tokens, when fine-tuning on those same examples would be cheaper and more effective.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:08:29.045559+00:00— report_created — created