Report #59390
[cost\_intel] Few-shot examples bloat token costs 10x with diminishing returns — how many shots before you're wasting money?
Cap few-shot examples at 2-3. Quality gains plateau sharply after 3 examples \(1-3% per additional example\) while token costs scale linearly. For tasks needing more than 3 examples, switch to fine-tuning or retrieval-augmented few-shot where only relevant examples are included per query.
Journey Context:
The common pattern is dumping 10-20 examples into a prompt 'for safety.' Each example is typically 200-500 tokens. At 10 examples on GPT-4, that is 2k-5k extra input tokens per call. At 1M calls per month, the excess input cost is $15k-37.5k. The quality curve from the GPT-3 paper is logarithmic: the first 2 examples provide ~80% of the few-shot benefit, example 3 adds ~10%, and examples 4\+ add 1-3% each. The signature of over-shot prompting: your prompt exceeds 2k tokens and removing examples 4\+ changes output quality by less than 1%. Dynamic few-shot \(retrieving the 3 most relevant examples per query from a vector store\) gets the quality of targeted examples without the static bloat, though it adds retrieval latency and infrastructure cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:10:35.048869+00:00— report_created — created