Report #81780
[counterintuitive] Why do few-shot examples sometimes make the model's output worse instead of better
Evaluate few-shot vs zero-shot empirically for each specific task; use the minimum number of examples needed; ensure examples are highly consistent with each other; for tasks where the model already has strong capabilities, consider zero-shot with clear instructions instead of few-shot
Journey Context:
The common belief is that providing examples always helps the model understand what you want. Min et al. \(2022\) showed a counterintuitive finding: replacing the labels in few-shot examples with random labels barely hurts performance. The model doesn't primarily learn from the input-label mapping in demonstrations — it benefits from the format, the domain, and the length distribution that examples establish. This means few-shot examples can hurt when they conflict with the model's pre-trained behavior, take up context space needed for the task, introduce inconsistent patterns the model overfits to, or shift the output distribution in the wrong direction. Few-shot performance is non-monotonic: adding more examples can decrease performance after a point due to attention dilution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:52:03.313150+00:00— report_created — created