Report #50033
[counterintuitive] Does the order of few-shot examples affect LLM performance
Randomize the order of few-shot examples across test runs and use validation sets to find optimal orderings, or use instruction-based prompting if few-shot variance is too high.
Journey Context:
Developers often append a few static examples to a prompt and assume the model generalizes equally from all of them. Research shows LLMs are highly sensitive to the ordering of few-shot examples. A specific ordering can accidentally trigger spurious correlations or majority-label biases \(e.g., if the last three examples are all positive, the model is biased toward positive\). Performance variance due to ordering can be larger than the variance between entirely different models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:27:43.128547+00:00— report_created — created