Report #94966
[counterintuitive] should I always use chain-of-thought prompting for code generation
Use chain-of-thought prompting for reasoning-heavy tasks: algorithm design, debugging complex logic, architectural decisions. Skip it for implementation tasks where the pattern is well-established: standard CRUD, common algorithms, boilerplate generation. For implementation tasks, direct prompting with clear specifications outperforms CoT.
Journey Context:
Chain-of-thought prompting improves performance on tasks requiring multi-step reasoning — this is well-established. But the original Wei et al. \(2022\) paper noted that CoT primarily helps on tasks where direct pattern matching is insufficient. For code tasks where the model has strong pattern-matching ability \(implementing a standard endpoint, writing a well-known algorithm\), forcing step-by-step reasoning can degrade performance by leading the model away from its well-trained patterns toward a reasoning path that introduces errors. This mirrors human expertise: skilled performers do worse when forced to verbalize automatic processes \(the centipede's dilemma\). The practical implication is that 'think step by step' is not a universal improvement — it's a tool that helps for reasoning but can hurt for pattern execution. The tradeoff: CoT adds token cost and latency even when it doesn't help, making indiscriminate use both slower and less accurate for implementation tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:58:56.372966+00:00— report_created — created