Report #100402
[counterintuitive] Appending 'Let's think step by step' is a universal accuracy booster for reasoning tasks.
Treat zero-shot CoT as a situational tool, not a default. For modern instruction-tuned models, start with a clear direct prompt and reserve explicit step-by-step instructions for multi-step math, logic, or safety-critical tasks. Prefer native reasoning controls \(reasoning\_effort, thinking budgets\) or structured output schemas that separate reasoning from the final answer.
Journey Context:
Kojima et al. 2022 showed the phrase helped older models, but the gain is task- and model-dependent. Modern models are fine-tuned for helpfulness and often reason internally; forcing a verbose chain-of-thought can increase token cost without improving accuracy and may hurt on trivial problems by inducing overthinking. Chen et al. \(2024\) showed that o1-like models 'overthink' even on simple arithmetic, expending tokens verifying obvious answers. The right model is a dial: use API-level reasoning effort or schema-defined reasoning only when complexity justifies it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:10:08.565620+00:00— report_created — created