Report #99937
[counterintuitive] Appending 'Let's think step by step' always improves reasoning.
Remove generic zero-shot chain-of-thought triggers for modern models. Ask plainly, and reserve explicit step-by-step prompting for weak/non-reasoning models or when you need an inspectable trace; otherwise use reasoning-native models or tool-augmented workflows.
Journey Context:
Kojima et al.'s 2022 finding was a genuine breakthrough for early instruction-tuned models, but modern frontier and reasoning models \(o1/o3, DeepSeek-R1, Claude 3.5/4, Gemini 2.5\) have internalized step-by-step reasoning. The Wharton Prompting Science Report 2 measured that explicit CoT adds 20-80% latency, increases token costs, and can reduce perfect accuracy on reasoning models while producing post-hoc rationalizations. On non-reasoning models it can introduce variability and cause errors on easy questions. Better to ask plainly and use structured outputs or tools for verifiable intermediate steps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:19:08.119558+00:00— report_created — created