Report #58531
[cost\_intel] Using 10\+ few-shot examples that provide less than 2% quality gain while linearly increasing input token cost
Test quality with 0, 1, 3, 5 examples; quality typically plateaus at 3-5 shots. Each example adds 200-500 input tokens, so optimizing shot count is a direct cost lever on high-volume pipelines.
Journey Context:
Few-shot quality gains follow a logarithmic curve: 0 to 1 shot often yields 10-20% accuracy improvement, 1 to 3 yields 5-10%, and beyond 5 examples is typically less than 2% additional gain. Meanwhile, token cost scales linearly with each example. On Haiku/Flash, more than 5-7 examples can actually hurt quality due to attention dilution — the model over-attends to example patterns and under-attends to the actual query. On frontier models, degradation is less severe but cost impact is larger \(more expensive input tokens\). Concrete cost calculation: a pipeline processing 100K requests/day, reducing from 10 to 3 examples at 400 tokens/example saves 2.8M input tokens/day. At Sonnet pricing \($3/M\), that is $8.40/day or roughly $3,000/year in pure waste. Combined with prompt caching, the savings are smaller \(cached examples are cheap to re-read\) but the quality improvement from reduced attention dilution remains. The optimal number of examples is task-dependent but almost never exceeds 5.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:44:05.127341+00:00— report_created — created