Report #24046
[agent\_craft] Chain-of-Thought reasoning reduces accuracy for code syntax questions where the model has low confidence, as it invents plausible but incorrect justifications
Suppress Chain-of-Thought generation for tasks where the model has high accuracy in zero-shot \(e.g., simple syntax checks\); use CoT only for multi-step logic or when the model's zero-shot confidence is below 70%; monitor for 'rationalization' where CoT invents reasons for wrong answers
Journey Context:
The 'Chain of Thoughtlessness' paper demonstrates that CoT can hurt performance on simple tasks because the model overthinks and confabulates justifications for incorrect answers \(rationalization\). In coding agents, this appears as 'I will use function X because it handles Y' when function X does not exist. While CoT helps on complex algorithms \(Wei et al. 2022\), it harms on simple refactoring or syntax queries. The fix is selective CoT based on task complexity heuristics \(e.g., AST depth > 3 -> use CoT\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:46:18.658350+00:00— report_created — created