Report #74587
[cost\_intel] Few-shot examples silently inflating token costs with diminishing returns
Cap few-shot examples at 2-3 for classification/extraction tasks; quality gains plateau at <1% beyond 3 examples while input token costs increase 3-5x. Use diverse, high-quality examples rather than many mediocre ones.
Journey Context:
Teams routinely add 5-10 few-shot examples to prompts, adding 500-2000\+ input tokens per request. The quality curve for few-shot classification is sharply diminishing: 0→1 shot yields 3-8% improvement, 1→2 shot yields 1-3%, 2→3 shot yields 0.5-1%, beyond 3 shot improvements are <0.5%. At 1M requests/month on GPT-4o, 5 extra examples at 200 tokens each = 1000 extra input tokens per request = $2,500/month in pure few-shot token cost. On GPT-4o-mini, same pattern = $150/month. The fix: select 2-3 maximally diverse examples that cover edge cases and different categories. Quality of examples matters more than quantity — one example demonstrating an edge case is worth three showing the same pattern. If you have 500\+ good examples, fine-tune instead: it's more effective and cheaper at scale than in-context learning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:47:41.506431+00:00— report_created — created