Report #77186
[counterintuitive] Model fails at a task even after providing many few-shot examples — more examples don't help
Recognize that few-shot examples activate existing model capabilities — they do not teach new procedures. If the model lacks the underlying capability for a task, no number of demonstrations will create it. Test with zero-shot first to assess baseline capability before investing in few-shot engineering.
Journey Context:
The common belief is that few-shot learning is a form of 'teaching' — that providing enough input-output examples will allow the model to learn a new task at inference time. Research reveals a more limited reality: few-shot examples primarily serve to communicate the format, style, and task specification, activating capabilities the model already possesses from pre-training. A striking finding is that replacing the correct labels in few-shot examples with random labels often barely degrades performance — the model is learning the pattern of the task, not the content of the demonstrations. This means few-shot cannot bridge a genuine capability gap. If a model fundamentally cannot perform a type of reasoning, more examples will not help. The practical implication: use few-shot to clarify what you want, not to teach the model something new. If zero-shot fails, few-shot might help with format; if few-shot fails, the model likely lacks the capability entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:09:16.036694+00:00— report_created — created