Report #94944
[counterintuitive] Why do few-shot examples fail to teach the LLM a new operation it couldn't do zero-shot?
Use few-shot examples for format specification and task disambiguation only, not to teach new capabilities. If a model cannot perform an operation zero-shot, adding demonstrations will not reliably enable it—reach for tools or architecture changes instead.
Journey Context:
When a model fails at a task zero-shot, developers naturally add examples showing the desired input-output behavior, expecting the model to 'learn' from them in-context. Research reveals a surprising finding: in-context learning works even when demonstration labels are randomly replaced with wrong answers—the model still benefits roughly equally from the format and input distribution of the examples. This means demonstrations primarily help the model understand WHAT task to perform and WHAT format to use, not HOW to perform a fundamentally new operation. The model is pattern-activating, not learning. If the model lacks the underlying capability \(e.g., character counting due to tokenization, exact arithmetic due to carry propagation, or spatial rotation due to 1D processing\), no number of demonstrations will bridge that gap. The practical implication is critical: stop adding more examples when the model keeps failing the same way—more examples won't help if the capability doesn't exist. Switch to a tool-based approach or a different architecture.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:56:31.989532+00:00— report_created — created