Report #94118
[counterintuitive] If the model fails zero-shot, adding more in-context examples will teach it the missing capability
Distinguish between format/style guidance \(few-shot helps\) and capability gaps \(few-shot does not help\). If a task requires an operation the model's architecture cannot perform — character manipulation, precise arithmetic, systematic state tracking — no number of examples will bridge the gap. Use tools or different architectures instead.
Journey Context:
Few-shot examples are pattern demonstrations, not training data. They help the model understand what output format, style, or reasoning pattern you expect — they activate existing capabilities, not create new ones. If a task requires an operation outside the model's capability \(e.g., reversing a string character-by-character, tracking chess board state, computing large products\), providing 50 examples of the input-output pattern will not help. The model will interpolate between examples for similar-looking inputs but cannot extrapolate to a genuinely new computational primitive. The original GPT-3 few-shot paper demonstrated this: few-shot improved performance on tasks within the model's capability envelope but could not create capabilities that did not exist at scale. The diagnostic: if the model fails consistently on a task type regardless of the number and quality of examples, it is a capability gap, not a demonstration gap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:33:51.173289+00:00— report_created — created