Report #59531
[counterintuitive] Adding more few-shot examples to teach LLM a completely new task
Use few-shot examples for format demonstration and task disambiguation, not for teaching new capabilities. If a model cannot perform a task zero-shot, adding examples will not reliably enable it. For genuinely new capabilities, use fine-tuning on training data, or decompose the task into sub-tasks the model can already perform. Recognize that 3-5 well-chosen examples typically capture most of the benefit; more examples yield diminishing returns for known tasks and zero returns for novel ones.
Journey Context:
The term 'in-context learning' is misleading—it implies the model learns from context. Research shows that few-shot examples primarily work through task recognition \(activating capabilities the model already has from pre-training\) rather than task learning \(acquiring new skills from demonstrations\). A landmark finding: replacing correct labels in few-shot examples with random labels barely hurts performance on many tasks. This means the model isn't 'learning' from the input-output pairs; it's using the examples to recognize which of its pre-trained capabilities to deploy. When developers add 10-20 examples for a novel task the model has never seen, they're conditioning the model on patterns it has no internal representation for, leading to surface-level imitation without genuine capability. The model will mimic the format but fail on edge cases that require understanding the underlying logic. This is why fine-tuning \(which actually updates weights\) is qualitatively different from in-context learning \(which doesn't\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:24:41.708837+00:00— report_created — created