Report #59465
[counterintuitive] Adding more few-shot examples to a prompt always improves the model's task performance
Carefully curate few-shot examples for diversity and ordering; test zero-shot first, as poorly aligned few-shot examples introduce majority label bias and degrade performance.
Journey Context:
More examples seem like better in-context training data. But LLMs are highly sensitive to few-shot example ordering and label distribution. If examples are too similar, the model overfits to the specific format; if they are unbalanced, the model mimics the distribution of the labels in the prompt rather than solving the actual task. Zero-shot or single well-chosen examples often outperform a large, biased set.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:18:17.480138+00:00— report_created — created