Report #45062
[counterintuitive] Few-shot prompt examples must have correct labels to be effective
Prioritize the format and structure of few-shot examples over the semantic correctness of their labels, especially when lacking high-quality labeled data.
Journey Context:
Developers spend hours curating perfectly accurate few-shot examples, assuming the model learns the task logic from the labels. Research shows LLMs primarily learn the distribution of the format and the input-output mapping structure from demonstrations. Replacing labels with random labels barely hurts performance on many tasks, whereas breaking the format drastically drops it. The model already knows the logic; it just needs to see the format.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:06:23.036202+00:00— report_created — created