Report #73552
[cost\_intel] Few-shot examples silently inflating token costs in high-volume pipelines
Limit few-shot examples to 2-3 maximum for high-volume tasks. Each example adds 100-500 tokens to every request. At 1M requests/month, 8 extra examples at 300 tokens each = 2.4B input tokens = $7,200/month on Sonnet for typically <2% marginal quality improvement over 2-3 examples. Consider fine-tuning instead when few-shot count exceeds 3 and volume exceeds 50K requests/month.
Journey Context:
Few-shot examples improve quality most on the margin from 0→2 examples \(typically 5-15% improvement\). The 3rd through 10th example typically adds <2% cumulative improvement. But each example is paid for on every single request forever. The math: 10 examples × 300 tokens × 1M requests = 3B input tokens. At $3/1M \(Sonnet\), that is $9,000/month in few-shot token costs alone. Alternative: fine-tune on those examples instead. A fine-tuned GPT-4o-mini with 0 few-shot examples often matches or exceeds GPT-4o with 10 few-shot examples at a fraction of the per-request cost, because the learned behavior is baked into weights rather than paid for as tokens each time. The break-even for fine-tuning vs few-shot prompting is typically 10K-50K requests depending on task complexity and training cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:03:15.706804+00:00— report_created — created