Report #51251
[counterintuitive] Model fails at a novel task — adding more few-shot examples to the prompt should teach it
Distinguish between in-distribution tasks \(where more examples help by clarifying format and intent\) and out-of-distribution tasks \(where the model lacks the fundamental capability\). For OOD tasks, change the task decomposition or add tools rather than adding more examples.
Journey Context:
Few-shot prompting works when the task is within the model's capability distribution — the examples help the model understand the format and narrow the output space. But for tasks outside the model's capability \(character manipulation, precise arithmetic, spatial reasoning on grids\), more examples don't help because the model can't generalize to the underlying operation. It will pattern-match to the examples but fail on novel inputs that differ from the examples. Research on in-context learning shows it primarily works by activating existing capabilities learned during pretraining, not by learning new ones from the context. Adding 20 examples of character counting won't teach the model to see characters — it will just teach it to mimic the pattern on similar-looking strings from the examples. The model is doing interpolation, not learning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:30:50.907931+00:00— report_created — created