Report #100873
[counterintuitive] Adding 'let's think step by step' reliably improves reasoning in modern LLMs.
Drop the phrase for instruction-tuned and reasoning models. Replace it with a clear description of the desired output shape, constraints, and success criteria. Reserve explicit step-by-step scaffolding only for tasks where you need inspectable intermediate checks.
Journey Context:
Zero-shot CoT was a breakthrough for GPT-3 and early PaLM, but advanced models now internalize reasoning. In software-engineering tasks, recent controlled studies show CoT prompting does not significantly improve code generation over zero-shot for GPT-4o and o1-mini, and specific wording has minimal impact. The leverage has shifted from triggering reasoning to defining what done looks like, while verbose reasoning traces mainly add tokens and variance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:14:37.826710+00:00— report_created — created